9.7 Billion Web Pages and Nothing’s There
As of January 2006, Google has indexed 9.7 billion web pages. When I search on a string that is even somewhat popular, I often get back hundreds of thousands or millions of results. In addition, I find it very difficult to obtain the most recent results unless I’m very, very careful about how I enter my search string. Why is it so hard to find really useful data?
Try answering these questions and tell me how easy it is for you to put your fingers on the data (without paying $2,500 and above for reports from some analyst firm):
- How many total web sites are there?
- Worldwide, what’s the installed base of mobile phones? How many are web enabled?
- What are the various flavors of wireless, data-centric technologies (Wifi, Wimax, CDMA, GSM, EV-DO, et al)? How fast are they? What are the growth rates?
- What is the guesstimate for the growth in, say, data? When you think about the demand side creation of media by consumers, is there any way to quantify this increase?
- How many unique visitors does Wikipedia get per day?
- How many blogs are delivered by spammers? (Out of the 29.8 million tracked by Technorati).
I could go on-and-on but you get the drift. For simple searches on Google, Yahoo, Icerocket and others, it’s fairly trivial to get good results back. But when you’re searching for more complex, meaty results, it’s stunningly difficult and time consuming to get answers.
One would assume that the Federal Communications Commission, the Dept of Commerce, World Wide Web Consortium or many of the other governmental or non-profit companies would provide this data (especially the US governmental agencies to whom I’m paying taxes!) but alas, they don’t.
6 Comments
About Steve Borsch
Strategist. Learner. Idea Guy. Salesman. Connector of Dots. Friend. Husband & Dad. CEO. Janitor. More here.
Connecting the Dots Podcast
Podcasting hit the mainstream in July of 2005 when Apple added podcast show support within iTunes. I'd seen this coming so started podcasting in May of 2005 and kept going until August of 2007. Unfortunately was never 'discovered' by national broadcasters, but made a delightfully large number of connections with people all over the world because of these shows. Click here to view the archive of my podcast posts.
Perhaps I’m missing something, but I just copied your question “How many unique visitors does Wikipedia get per day?” and pasted it directly into google. In the synopsis of the first page I read “over 60000 people per day arrive at Wikipedia via Google” I clicked the link, hit Ctrl-F and searched on “per day” found the statement and noticed the footnotes [2] and [3] for “http://www2.knams.wikimedia.org/stats/00-all-projects/” and “http://en.wikipedia.org/wiki/Wikipedia:Traffic” respectively. The first stated “sorry, the statistics are offline for the moment” On the second I saw the statement “Alexa.com independently compiles traffic ranking figures” and I clicked on the footnote [1] what I found appeared to be fairly meaty data. Perhaps I just got lucky, or maybe I misunderstood what you were looking for.
In my looking around, I did come across a great google page at “http://www.google.com/help/cheatsheet.html” I have a hunch that finding the data you are looking for might be easier with better queries.
P.S. I love my MicroTrack, thanks again for the recommendation!
Google Answers
http://answers.google.com/answers/
SEAN: This is so funny. I *knew* if I put down anything I searched for and was frustrated about, someone would immediately pull up exactly what you did.
I actually had found plenty of Wikipedia data and answers to most of the other bullet points…it just took an inordinate amount of time and my search string keyboard tickling is pretty decent. Most of the really good data is buried inside of a PDF, .doc, within walled gardens or have big price tags attached.
Let’s take my fourth bullet point down: The growth in data. I know this seems pretty obtuse, but I had a reason to find out. AIIM – The Enterprise Content Management Association (AIIM.org) has good stats on growth in enterprise content and the number of exabytes, exponential increases, etc., but nothing in terms of online data (so tracking, say, growth in rich media stored/delivered online is tough to ascertain).
DIMITAR: Yeah…I know all about Google Answers but I’d just go to IDC or Gartner and plunk down a few thousand and get their results, research methodology and sources which aren’t returned by Answers.
Again my point was that much of the quantitative one would expect our Federal government would produce is either hidden, non-existent or I don’t know where to look.
In any event guys, I’d prefer to not become a research librarian to find and archive data.
Steve, I agree with you, this problem (search, knowledge management) is far from solved, but I still remember the pre-internet age…Now, we could use Google, Wikipedia, blogs, MIT OCW, Project Gutenberg, video, podcasting and many other sources. Sky is the limit!
Why You Love the Internet
http://www.chicagoboyz.net/archives/003957.html
WHY IS GOOGLE GETTING WHIPPED IN SOUTH KOREA BY NHN?
http://bernardmoon.blogspot.com/2006/02/why-is-google-getting-whipped-in-south_01.html
I know, I took the bait. I agree with your basic tenet though. Everytime I think about this great internet thing we have here, I remember a quote from the movie “Disclosure”, though it took me some time to remember it was from there and not “Sneakers”, “The Net”, “Real Genius” or “The Saint”. (I was able to check all the scripts online)
“That’s the legacy of the modern age. We have information but no truth. Flashes of electrons in a grain of sand.”