Skip to main content

The stuff in the web is not information - it is data

Thousands of blogs copy snippets from different sources, sometimes enrich them with comment (more often they don't), repost, redistribute, recycle. Twitter plugs up the net with autistic-looking short-messages and a seeming gazillion applications allow users to automatically cross-contaminate social networks with annoying status-messages. It is natural that many are  looking for ways to survive the 'information-tsunami' of the ever-growing web.
While filtering for keyphrases is the usual way out, David Gelernter sees hope in exchanging the axis along which the web-babble should be ordered: let's use the time axis (see "Time to start taking the internet seriously" on Reminiscent of twitters lifestreams, information would visually flow from future over present to past letting the reader focus on everything in the timewindow she chooses. Aside from the big questionmark (why would such a reshuffling make lifestreams easier to bear?) there is a major misperception underlying all this visionary shabang: the stuff in the web is not information - it is data.

If we stare at all the data we get blind. If we focus on information the web is a much nicer place. Information arises from the analysis of data, from their connection, processing. Of course, a tweet can contain information ("I am going out now"), but the web's information-content is more powerful than that. Tweets on the weather could be accumulated to support weather-forecasts. Chatter on holiday-plans might help airlines organize their resources. Of course, this analysis is already going on - mainly to extract information for targeted marketing. But wouldn't it be nice if the future interface to the web is not a collection of access-paths to multicoloured 'social' networks or chatterboxes but a configurable data-analysis tool that helps pull out the real information? Travel-tips would not originate in some backoffice of an agency with clear commercial interest - they would be the result of your individual correlation of web-babble with weather-information and flight-prices, for example. Updated life. News could be ranked according to clickrates, or coverage, or resonance in a monitored corner of the net that you define. You name it.
Not only would the data-flood of the net be easier digestible, information would again be decoupled from the commercial interest of information-providers. The web could continue as the anarchic place  it once was - or it could at least pretend to.


Sandor Ragaly said…
Hi Carsten, I looked around your site, and again, you managed to immediately attract my interest. :-)

And I agree with your stressing the "meaning of meaningfulness", with usable, or delighting etc. "information" - and "data" just being raw material (but too often also presented as information).

It's decisive in perception, I think, and that's very obvious within the context of an overload of data - or "empty signifiers" - in our net-in-revolution. It's like something ripe or complete enough in itself to be used by a recipient (piece of information) vs. some mere "material", staying without meaning or use in most cases (data side).

However, we must not forget that this difference is not only decisive for communication, but at the same time is widely individual. To make a pretty clear example, a piece of art, like a painting: While it may induce, arouse the imagination and its creative working of a recipient, leading to associations, the construction of meaning, producing emotions etc., the same painting may leave another person completely untouched, and even with the feeling of no meaning - just data, to use this term.

Many factors in communication are common, collective because both of the common (human) species and mainstreaming factors in family, societies, nations, regions or crossing "groupings".

But, little in studying maths, most in talking about the arts, the individual factor - or: the factorS - become important, and even may decide it all. And that's the problem, Carsten, with your idea of a "configurable data-analysis tool that helps pull out the real information":

(1) The "real" information you mention varies across individuals, cultures, even situative variables, and

(2) to design and also configure that data-analysis tool, you needed an immense flexibility and an extreme complex access to the net resources, both because of the complexity of (1).

(3) The problematic political question is: Who would be this actor or group to design, produce, and restrict such a huge approach which could be overwhelmingly powerful - would the net be the anarchic place then? Think also of the rising critical questions on personalisation (Google Search, Facebook, Amazon), where also connections are producing some information out of data, and which may pose a threat to openness, guiding automatically and perhaps too much...

I think higher complexity and/or aggregation, the tighter structure (or: order) would in contrast lead away from anarchy. I think what we have now is more or less an unruled sphere, more or less... and right the unruled sphere implies chaos and: often data, not information.

This internet is already the tool we can use by searching, social networks, rss etc. to connect and process data to information of a higher quality, and in a very individual way, much bottom-up, not top-down by a super-tool - albeit google is one, but an extreme wide one.

I wrote this with a big approach of yours in mind, but on a micro (or meso) level, in more specific ways, it's of course desirable to have and develop tools connecting the resources of the net and producing information of higher levels - social networks are one attempt, I think. But answering, I emphasize limits here.

In my opinion, the more important connections, combinations, framings, contextual decisions and processing of data, half-information and higher information will still be done personally, by our so complex, enough flexible, attentive, curious, and individual human minds.
Carsten Hucho said…
Sandor, your comment is very important - it definitely brings the discussion on a higher level, as you point at the crucial elements.
In my opinion, however, there would not be one 'superalgorithm' or central tool to extract information from data. It would be a very individually configured toolset - tailored to your personal needs or desires.
Sascha Lobo ( made an important point in his (german) text on Siri, the speech-recognizing software of Apple (,1518,800533,00.html). As Siri learns to interpret my sentences (not only learning to transliterate the words that I say, but to put meaning in what I say) it has to understand the context. Once Siri has learned to dial the correct number when I say 'I am hungry' (Siri learned that I get food by calling a Pizza-service in the evening but a sushi restaurant at noon) it has obviously implemented lots of context that it extracted from me. In future implementations Siri might get this contextual input from my emails, from my twitter- , facebook, etc.
Scaringly this smart-app communicates with home-base - a reason why Sascha Lobo sees this as a new way for personal-data mining (besides facebooks info-hunger and googles data-logging).
But why not work with very local contextual applications that keep their electro-mouths shut to the outside and just optimize their preferences to suit my needs? It would be highly decentralized and still able to extract information from the fine-grained data desert.
Sincerely, Carsten.

Popular posts from this blog

Academics should be blogging? No.

"blogging is quite simply, one of the most important things that an academic should be doing right now" The London School of Economics and Political Science states in one of their, yes, Blogs . It is wrong. The arguments just seem so right: "faster communication of scientific results", "rapid interaction with colleagues" "responsibility to give back results to the public". All nice, all cuddly and warm, all good. But wrong. It might be true for scientoid babble. But this is not how science works.  Scientists usually follow scientific methods to obtain results. They devise, for example, experiments to measure a quantity while keeping the boundary-conditions in a defined range. They do discuss their aims, problems, techniques, preliminary results with colleagues - they talk about deviations and errors, successes and failures. But they don't do that wikipedia-style by asking anybody for an opinion . Scientific discussion needs a set

Left Brain, Right Brain

At a wonderful summer night I was lying in the grass, my little son beside me. We were staring into the dark sky, debating infinity, other planets, the origin of everything, observing falling stars that were whizzing through the atmosphere at a delightfully high rate. Why did we see so many of them that night? What are falling stars? What are comets. Why do comets return and when? The air was clear and warm. No artificial lights anywhere. The moon was lingering lazy in the trees across the river. Some fireflies were having a good time, switching their glow on and off rather randomly - in one group they seemed to synchronize but then it was random again. It reappeared: a few bugs were flashing simultaneously at first ... it started to expand, it was getting more. A whole cloud of insects was flashing in tune. Are they doing this on purpose? Do they have a will to turn the light on and off? How do those fireflies communicate? And why? Do they communicate at all? My son pointed at a fie

My guinea pig wants beer!

Rather involuntary train rides (especially long ones, going to boring places for a boring event) are good for updates on some thoughts lingering in the lower levels of the brain-at-ease. My latest trip (from Berlin to Bonn) unearthed the never-ending squabble about the elusive 'free will'. Neuroscientists make headlines proving with alacrity the absence of free will by experimenting with brain-signals that precede the apparent willful act - by as much as seven seconds! Measuring brain-activity way before the human guinea pig actually presses a button with whatever hand or finger he desires, they predict with breathtaking reproducibility the choice to be made. So what? Is that the end of free will? I am afraid that those neuroscientists would accept only non-predictability as a definite sign of free will. But non-predictability results from two possible scenarios: a) a random event (without a cause) b) an event triggered by something outside of the system (but caused).