Colour Name Extraction Using spaCy

Had this interesting problem at work where we had to extract some unknown colour names and adjectives from a chunk of text. We had a list of basic colours that was in no way comprehensive and we needed to extract the adjectives describing the colours. For the initial exploration, we ended up using spaCy‘s part-of-speech tagger. Our hypothesis was that since the tags that get assigned to the colours and adjectives can either be ADJ or NOUN, extracting group of words next to the colour(that we could identify from our limited set of colours) with those tags might just do. In the following block, you can see the tags spaCy’s part-of-speech tagger assigns to each token in the sentence.

walmart text

[('Available', 'ADJ'), ('in', 'ADP'), ('a', 'DET'), ('variety', 'NOUN'), ('of', 'ADP'), ('colors', 'NOUN'), (',', 'PUNCT'), ('including', 'VERB'), ('bold', 'ADJ'), ('blue', 'ADJ'), ('heather', 'NOUN'), (',', 'PUNCT'), ('ebony', 'NOUN'), (',', 'PUNCT'), ('jazzberry', 'NOUN'), ('pink', 'ADJ'), ('heather', 'NOUN'), (',', 'PUNCT'), ('light', 'ADJ'), ('steel', 'NOUN'), (',', 'PUNCT'), ('navy', 'NOUN'), ('heather', 'NOUN'), (',', 'PUNCT'), ('new', 'ADJ'), ('frosty', 'ADJ'), ('lavender', 'NOUN'), (',', 'PUNCT'), ('plum', 'NOUN'), ('port', 'NOUN'), ('or', 'CONJ'), ('slate', 'NOUN'), ('heather', 'NOUN')]

And you will have extracted the adjectives along with other colours next to whichever colour that is there both in the colours list and the text. Notice that in our case, each of the colours were properly delimited before and after with some sort of punctuation. This might not always be the case and you might need some post processing. At the end of the 30 minute hack, we were able to extract what we needed in a clean way.

{'bold blue heather',
 'jazzberry pink heather',
 'navy heather',
 'new frosty lavender',
 'slate heather'}

Happy hacking! 😉


