SEARCH ENGINE OBSERVATIONS: – Google Holes, Yahoo Gaps & Black Holes

So, what are they? Well, Black holes are at the center of every known galaxy and are like the eye of a storm. Their center seems calm and undisturbed, but at the edges of the eye, huge forces of nature are being exerted, ripping everything it contacts to shreds. Black holes are immense gravitational wells from which nothing can escape, or at least that’s the theory, amended in part by Dr. Hawking a few weeks ago.

Google might like to be thought of as a ‘black hole of Internet search engines,’ consuming all the information that falls within their gravitational reach. The difference being, the information does escape and the web is not really ripped apart at the seams. Oh well, so much for that analogy.

But there really are holes in Google, Yahoo! and all other search engines that have nothing to do with the forces of nature. These holes have serious implications for the quality of search engine results, and therefore require the attention of your optimization efforts.

We shall begin the analysis with Google – The current technology leader in the search engine field. When a user visits the Google search engine and runs a search, they often enter in complete phrases. This tendency is likely to become more common as text to speech comes to reality. How Google treats these phrases demonstrates a fault within their algorithms, and a hole in the accuracy of their search results. When you include a common word in a phrase within the Google search box, it gives you the following message above the search results:

“for” is a very common word and was not included in your search.” [details]

If you click for details, you get the following explanation:

“Google ignores common words and characters such as “where” and “how”, as well as certain single digits and single letters, because they tend to slow down your search without improving the results. Google will indicate if a common word has been excluded by displaying details on the results page below the search box. If a common word is essential to getting the results you want, you can include it by putting a “+” sign in front of it. (Be sure to include a space before the “+” sign.)”

But, here’s where Google falls down. Visit Google right now. Open up 4 windows and in each window’s search box type the following queries:

  • Hotels New York
  • Hotels in New York
  • Hotels for New York
  • Hotels about New York

The words in’ for’ and about’ all get the standard, “This is a very common word and was not included in your search,” message. Yet all four display entirely different results?

What is Google doing? I considered the possibility that I was pulling results from different data centers, so I ensured this was not the case. I then tried a variation on this search query, using the term “search engine optimization X hotels” the ‘X” representing a blank space, or one of the words, in’ for’ or about. In this test, only where the X’ represented a blank space did I get varying results. Still, by rights they ought to have all been identical.

It occurred to me that perhaps Google was using different algorithms when it identified a place name in the search query by trying to understand the context of the query. That would be a logical move. I’m very familiar with software that comprehends the context of textual content. Could it be that Google is trying to apply some contextual filtering to their results? I then proceeded to try a garbage search. A search phrase with common words which really have no direct relevance, and therefore words which would never appear together logically:

“room hotel tapestry highway lagoon”

Interestingly, Google had 1720 entries which matched this query, and the results varied depending on which of the X terms I inserted between any two of the words. Search results also varied if I moved the placement of the ignored word within the query. But is this context? A further test would be required. I put together 3 queries using the same terms, but with a common or ignored word inserted as follows:

  • Filing tax return(s)
  • Filing a tax return(s)
  • Filing of tax return(s)

In this case, I tried singular and pluralized searches, to ensure that poor grammar was not affecting the results. Results varied for each search. That’s not to say they were all entirely different, just that they varied. I tried a few other searches and received similar results. Most importantly, the results I received were all equally contextually correct, which was a relief.

Some people have written to news groups and discussion boards that when Google comes across an ‘ignore’ word, it substitutes a wild card. However, if that were true, the various ignore words, would all return the same results and this is not the case. Therefore, it can be surmised that Google does not in fact ignore words at all! It is more likely that Google is using some measure of context algorithm. This is logical. The technology exists and Google is known to have bought a UK firm last year which was developing such a technology. Our own firm uses software which uses contextual analysis in its algorithms.

Taking the analysis a step further, which other engines seem to have a grasp on context? Obviously, the places to look first were Google’s competitors: Yahoo! Microsoft, and AskJeeves.

Askjeeves sprang immediately to mind, as it had originated the concept of “phrase a question” type searching, thus it should logically have some context filtering in place. In fact, when I ran the ‘tax return’ query through the engine, I still receives varying results. Very different results than Google, I might add. When multiple ‘ignore’ words were added to a query, results did not vary, which may indicate very limited filtering.

I then tried an alternate query. “diapers for baby” and “diapers on baby” This should logically return different results. One recommending diapers, and one about how to put them on, or keep them on or how they should look, etc. Surprisingly, I received identical results to my queries. Context was not being properly filtered by the very search engine which first introduced the concept! I tried the same search on Google. While results were jumbled a bit, the top web sites were the same for both queries, just in varying order. With over 550,000 results to choose from, this would indicate Google too, has a long way to go to fulfilling the promise of contextually correct responses.

Next, I turned my attention to Yahoo! I was somewhat surprised to discover that Yahoo! does not seem to have -any- filtering in place. Results did not vary at all for the test searches run when the “ignore” words were inserted or removed. Yahoo! also did not identify these terms as being ignore terms in their results, but the fact that results were unchanged when the terms were added or deleted would indicate that they were omitted and Yahoo! does not have the necessary algorithms to allow it to comprehend the context of a search query.

Is context an area where Yahoo! seriously lags behind Google and others? If true, this points to a widening gap between the search engines in the future. Google is already positioning for speech to text devices, can intonation be far behind? Yahoo! has not demonstrated any evidence of making strides in either of these areas.

Lastly I looked at the new Microsoft engine. No contextual filtering in place. Since this search engine is still in beta, I cannot in all fairness comment on it being behind in a race where we have not yet seen the final product. Still, it’s something to keep in mind for the future.

Implications for SEO

The implication of contextual search on how your web site performs in the search engines is immense. It means that the nuances of how people search have to be better taken into account by all SEO firms.

In our firm we recognized that as the world moved to speech to text and as the web grew in size, context would be the next big differentiator in search results. This means that context is already recognized and taken into account both by our technicians and our technology when analyzing a web site, and optimizing it for search engines.

Working to improve your web site’s performance in the search engines now requires a comprehension of how people are actually phrasing search queries and using that knowledge to properly position the content on your site, to account for the idioms used by your target audience.

Ensure that you are using phrases in the way you hear people asking questions. Ensure you cover all the bases and get all possible variations. Get outside help if you need it, but don’t miss out on your opportunity to take advantage of the Black Holes out there.