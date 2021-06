Machine learning systems are now ubiquitous in our daily lives and so the correctness of their behaviour is absolutely crucial. When an ML system makes a mistake it can not only result in an annoying online experience, but also limit your ability for socio-economic movement or, even worse, make life-threatening manoeuvres in your car. So how certain are you that a deployed ML system is thoroughly tested and you are not effectively a test user? On the flip side, how do you know that the system you’ve been developing is reliable enough to be deployed in the real world? And even if the current version is rigorously tested in the real world, after updating one part of the model, how can you be sure that its overall performance has not regressed? These are all hard questions that are rooted in the sheer complexity of the problems we try to solve in a data driven fashion and the scale of machine learning models we are building nowadays.