The Numbers Game

On their Home Page, Google claims to have 3,083,324,652 web pages in their index. This represents a HUGE jump from the 2,469,940,685 web pages listed just a few weeks ago.

I tried a few quick searches and while unscientific, I didn’t noticed any particularly new content or depth to the results. In fact, I noticed that the results in one search still cut off at the same 754 pages it had previously, in spite of the fact that I know there was lots of new content added over the previous 4 months. I should mention that I perform the same unscientific check every time I see any major search engine make this type of claim. So, if they have added new content, I couldn’t find it. Thus I am really not sure what they are counting. It could be that the number of non-English documents have shot up. For the fun of it, I tried some searches in French, Spanish and German. Nothing new. Perhaps I was looking in the wrong area.

Here’s what I was able to determine. It is possible that what has occurred is Google has changed the way that it counts pages. While Google claims 3,083,324,652 web pages, it seems plausible that they add the images which reside within their image database, and all their USENET newsgroup postings. Apparently these Usenet postings alone , found in the “Google Groups” area, total approximately 800 million postings. This would go a long way to explaining the large change in numbers.

Since this number was updated in early November, (if it’s a real number,) why hasn’t it changed? Has nothing changed in the index over the last 3 weeks? I’m pretty sure new web sites have been added to Google and to the Internet as a whole, during this time frame. I know for a fact, (a scientific, measurable fact) it -has- changed, because some of our new customer’s web sites have come online and new content from existing customers has been added. Did Google drop content to make room for the new stuff? I doubt it.

Why do the leading search engines even bother with these numbers? Do they really matter? Do the search engines really feel they add value with these number games? It makes me wonder if it’s truly accurate? Does anyone really believe that John Doe visits Google because they claim to have the most web pages? Does anyone use FAST because they claim half the number, but deliver results FASTer than anyone else? Does anyone visit any of the general search engines because they think it has a larger, faster, deeper database? No, of course not. People use a particular engine because they like it. They feel it gives them better results, more often, than other engines. So when Google, or any of the other major engines update their running counts, I always think, why bother? Who is this supposed to impress? Most people I know are impressed by good accurate results. Period.