We all know that the Internet is a treasure trove of information, but the actual size of that treasure trove is pretty startling. While there aren’t any hard and fast numbers for us to go off of, here are some quick answers to some data questions–
How many web pages are there?
In August 2007, Yahoo! was estimated to have 29.7 billion web pages indexed.
How many web sites are there?
In 2007 Netcraft reported having indexed over 108 million websites.
How much data is on the web as a whole?
If each page is an average of 30kb that would mean there are about 891,000 GB of HTML and text floating around on the web. This most likely doesn’t include images, videos, flash files, server-side scripts, Javascript, etc.
In 2002, the University of Berkley estimated that, all the surface files of the web (everything except server-side scripts) accounted for 170 terabytes. The same study estimated that instant messages add another 274 terabytes per yer; and that email generates around 400,000 terabytes annually. All that information comes out to nearly 800 MB of new information being generated annually by each of the Earth’s 6.3 billion residents. That’s a lot of data.
With so much data already in existence and so much new data being created on a daily basis it’s amazing that we can make heads or tails out of anything.
Too much of a good thing is bad.
While using data can be useful in analyzing trends, making decisions, and monitoring results, having too much data can be a burden. The amount of data on the web and the rate at which is growing is why discussions about data extraction and the semantic web are so important.
Comments
Add comment