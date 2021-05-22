A deep dive into the compression settings of deep learning’s most popular benchmark. I’ve been on a mission lately to get more people thinking about how lossy compression affects their deep learning models [1]. In the process I spent a lot of time with ImageNet [2], which consists entirely of JPEG files, and I started noticing some peculiar compression settings. To see how systemic these odd settings are, I decided to survey the compression settings over the entire dataset. In this post, I report what I saw, including why I think some of these settings are weird, and show the statistics I computed for each of the relevant compression settings. At the end I show that by plotting a 2D projection of these compression settings, it’s actually possible to see graphically that there were several different sources involved in the creation of ImageNet.