Toronto recently used an AI tool to predict when a public beach will be safe. It went horribly awry. The developer claimed the tool achieved over 90% accuracy in predicting when beaches would be safe to swim in. But the tool did much worse: on a majority of the days when the water was in fact unsafe, beaches remained open based on the tool’s assessments. It was less accurate than the previous method of simply testing the water for bacteria each day.We do not find this surprising. In fact, we consider this to be the default state of affairs when an AI risk prediction tool is deployed. You’re reading AI Snake Oil, a blog about the upcoming book. Subscribe to get new posts.The Toronto tool involved an elementary performance evaluation failure—city officials never checked the performance of the deployed model over the summer—but much more subtle failures are possible. Perhaps the model is generally accurate but occasionally misses even extremely high bacteria levels. Or it works well on most beaches but totally fails on one particular beach. It’s not realistic to expect non-experts to be able to comprehensively evaluate a model. Unless the customer of an AI risk prediction tool has internal experts, they’re buying the tool on faith. And if they do have their own experts, it’s usually easier to build the tool in-house!When officials were questioned about the tool’s efficacy, they deflected the questions by saying that the tool was never used on its own—a human always made the final…The bait and switch behind AI risk prediction tools