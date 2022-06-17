Now we know that we were looking at the list of surnames, not cities. We had no way of telling the difference without extending the context. In this particular case, larger sample size would barely help, as nearly any city name can be a surname for someone and a lot of cities are named after some people. In normal situations, our brain doesn’t even notice how tricky this kind of distinction is, because, as humans, we rarely operate without a rich context. NLP techniques have some of the required context for token classification from the text which surrounds any particular token and use word embeddings to make a “best guess” of what this word might mean. When we analyze columns of data (let’s say, from a CSV file), we don’t have any sentence to derive a context from. Instead, we have other columns and, more often than not, other files.

SCIENCE ・ 3 DAYS AGO