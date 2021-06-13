Classification with multiple categories is a common problem in Machine Learning. Let’s dive into the definition of the most commonly used loss function. One of the most prominent tasks at which Machine Learning has been historically very good at is classifying items (e.g. images, documents, sounds) into different categories. Especially in recent years, advancements in the hardware capable of executing mathematical models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs, LSTMs) made it possible to perform a quantum leap (sometimes literally) in performance. However, defining a model is only half of the story. To find the best parameters to perform such a task, it is also necessary to define a cost or loss function that captures the essence of what we want to optimize, and execute some form of gradient descent to reach a suitable set of parameters.