News & Press Releases

 

Coffee Talk, Semanticized

By Moriya Nissim

If you’re like a lot of us, your day doesn’t start until you’ve had your first cup of coffee. This post is dedicated to you.

We looked into the accuracy of one of the keyword categories we were asked to create recently: the Coffee category! This was an interesting case because it showcased exactly why semantic classification is not just necessary but truly superior to simple targeting to keywords. Whereas keywords have their strength in certain circumstances, such as pinpointing a specific brand name or person, they are often not a good solution for more general concepts used in every day conversation.

In this case, the client gave us a small list of keywords to target. Although we suspected that “coffee” and “espresso” are words that could appear in a very general context that isn’t necessarily related to coffee content, we assumed that the entire list of keywords would be accurate across Food & Beverage content that eliminate general mistakes. We then checked two sets of data:

  • Accuracy across top URLs that were classified as Coffee: 20% accuracy. We saw pages that mention coffee regardless of the main context of the page (how many times have you seen something like that, right?). For example, check out this page http://modernhomemodernbaby.com/link-love-the-mommy-files-opting-out-of-school-lunch/ that mentions, “5 Sack Lunches Kids Love (so easy to put together, I could do it without my morning coffee. Woot!!)”. We also saw home pages that have many different subjects and don’t focus on a specific context, including pages that contain links to a variety of articles or products like News (ie. http://www.dailymail.co.uk/) and Shopping (http://slickdeals.net/).
  • Accuracy across top URLs that were classified as Coffee and Food & Beverage: “better” accuracy than the above but still very poor, ~50% accurate. We found many articles pages completely randomly. For example, check out this hot fudge recipe here http://www.mybakingaddiction.com/homemade-hot-fudge-sauce/ – the mention “the crayons that the cook kept in an old coffee can” is completely unrelated to the overall page meaning. We also saw the home page of food sites with multiple topics (http://www.thekitchn.com/).

We all know it’s the main context of the page that matters, so the fact that we see a specific keyword certainly doesn’t necessarily mean the page is about that word. Furthermore, there is a greater chance that a simple keyword category that is built from a list of single words will appear across general pages or index pages that are likely to include multiple topics without a focus on a single context. When you think about how much we (and the world) love coffee, we wouldn’t hesitate using this word frequently!

It’s always instructive to see the process of how the categories built from a simple list of keywords can be so challenging. Thanks to our unique page level approach, we recommended the semantic approach to ensure that our Coffee category was standing on solid grounds! :)

Moriya Nissim is Director, Account Management at Peer39

From SemanticizeMe