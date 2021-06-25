…removing stop words for Data Science applications. It should be no surprise that data is most of the time, messy, unorganized, and difficult to deal with. As you work your way into data science from educational practice, you will see that most data is obtained from multiple sources, multiple queries, and that can lead to some unclean data. In some or most situations, you will have to come up with the dataset that will ultimately be used to train your model. There are a few articles out there that focus on numeric data, but I want the focus of this article to be on text data mainly, which coincides with natural language processing. With that being said, here is a simple way to clean your text data in Python, as well as when it would be useful. I will be using the popular dataset from TMBDF 5000 Movie Dataset [2], so that you can follow along.