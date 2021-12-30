A combination of statistical methods and engineering is key. Let’s see why and how KBinsDiscretizer should be the ML engineer’s friend. Data analysis can be seen as a device (or a black box; or a Chinese room) which, given some inputs, provides outputs. The main difference between rule-based and statistics-based (aka machine learning) is that the former accepts training examples and rules (“algorithms”) as inputs and gives predictions as output, while the latter expects training examples and labels as inputs and gives rules as output. That is, in machine learning the rules are inferred from the relationship between input data and labels, while in the classical approach the rules are pre-determined, usually by humans who spent years trying to figure them out. The difference is not as subtle as one may think: for example, in science the rules are often encoded in programs that perform a simulation of the physical system under investigation (e.g. Earth’s climate or interactions between particles). On the other hand, if those rules are not known beforehand as pre-existing theoretical frameworks, the other option is likely to be only one that makes sense. It is worth mentioning that in some notable cases such as protein folding, a machine learning approach called AlphaFold has been proven to be at least as successful as more traditional means, unlocking new exciting possibilities.

