Skip to main content

The stuff in the web is not information - it is data

Thousands of blogs copy snippets from different sources, sometimes enrich them with comment (more often they don't), repost, redistribute, recycle. Twitter plugs up the net with autistic-looking short-messages and a seeming gazillion applications allow users to automatically cross-contaminate social networks with annoying status-messages. It is natural that many are  looking for ways to survive the 'information-tsunami' of the ever-growing web.
While filtering for keyphrases is the usual way out, David Gelernter sees hope in exchanging the axis along which the web-babble should be ordered: let's use the time axis (see "Time to start taking the internet seriously" on Reminiscent of twitters lifestreams, information would visually flow from future over present to past letting the reader focus on everything in the timewindow she chooses. Aside from the big questionmark (why would such a reshuffling make lifestreams easier to bear?) there is a major misperception underlying all this visionary shabang: the stuff in the web is not information - it is data.

If we stare at all the data we get blind. If we focus on information the web is a much nicer place. Information arises from the analysis of data, from their connection, processing. Of course, a tweet can contain information ("I am going out now"), but the web's information-content is more powerful than that. Tweets on the weather could be accumulated to support weather-forecasts. Chatter on holiday-plans might help airlines organize their resources. Of course, this analysis is already going on - mainly to extract information for targeted marketing. But wouldn't it be nice if the future interface to the web is not a collection of access-paths to multicoloured 'social' networks or chatterboxes but a configurable data-analysis tool that helps pull out the real information? Travel-tips would not originate in some backoffice of an agency with clear commercial interest - they would be the result of your individual correlation of web-babble with weather-information and flight-prices, for example. Updated life. News could be ranked according to clickrates, or coverage, or resonance in a monitored corner of the net that you define. You name it.
Not only would the data-flood of the net be easier digestible, information would again be decoupled from the commercial interest of information-providers. The web could continue as the anarchic place  it once was - or it could at least pretend to.


Sandor Ragaly said…
Hi Carsten, I looked around your site, and again, you managed to immediately attract my interest. :-)

And I agree with your stressing the "meaning of meaningfulness", with usable, or delighting etc. "information" - and "data" just being raw material (but too often also presented as information).

It's decisive in perception, I think, and that's very obvious within the context of an overload of data - or "empty signifiers" - in our net-in-revolution. It's like something ripe or complete enough in itself to be used by a recipient (piece of information) vs. some mere "material", staying without meaning or use in most cases (data side).

However, we must not forget that this difference is not only decisive for communication, but at the same time is widely individual. To make a pretty clear example, a piece of art, like a painting: While it may induce, arouse the imagination and its creative working of a recipient, leading to associations, the construction of meaning, producing emotions etc., the same painting may leave another person completely untouched, and even with the feeling of no meaning - just data, to use this term.

Many factors in communication are common, collective because both of the common (human) species and mainstreaming factors in family, societies, nations, regions or crossing "groupings".

But, little in studying maths, most in talking about the arts, the individual factor - or: the factorS - become important, and even may decide it all. And that's the problem, Carsten, with your idea of a "configurable data-analysis tool that helps pull out the real information":

(1) The "real" information you mention varies across individuals, cultures, even situative variables, and

(2) to design and also configure that data-analysis tool, you needed an immense flexibility and an extreme complex access to the net resources, both because of the complexity of (1).

(3) The problematic political question is: Who would be this actor or group to design, produce, and restrict such a huge approach which could be overwhelmingly powerful - would the net be the anarchic place then? Think also of the rising critical questions on personalisation (Google Search, Facebook, Amazon), where also connections are producing some information out of data, and which may pose a threat to openness, guiding automatically and perhaps too much...

I think higher complexity and/or aggregation, the tighter structure (or: order) would in contrast lead away from anarchy. I think what we have now is more or less an unruled sphere, more or less... and right the unruled sphere implies chaos and: often data, not information.

This internet is already the tool we can use by searching, social networks, rss etc. to connect and process data to information of a higher quality, and in a very individual way, much bottom-up, not top-down by a super-tool - albeit google is one, but an extreme wide one.

I wrote this with a big approach of yours in mind, but on a micro (or meso) level, in more specific ways, it's of course desirable to have and develop tools connecting the resources of the net and producing information of higher levels - social networks are one attempt, I think. But answering, I emphasize limits here.

In my opinion, the more important connections, combinations, framings, contextual decisions and processing of data, half-information and higher information will still be done personally, by our so complex, enough flexible, attentive, curious, and individual human minds.
Carsten Hucho said…
Sandor, your comment is very important - it definitely brings the discussion on a higher level, as you point at the crucial elements.
In my opinion, however, there would not be one 'superalgorithm' or central tool to extract information from data. It would be a very individually configured toolset - tailored to your personal needs or desires.
Sascha Lobo ( made an important point in his (german) text on Siri, the speech-recognizing software of Apple (,1518,800533,00.html). As Siri learns to interpret my sentences (not only learning to transliterate the words that I say, but to put meaning in what I say) it has to understand the context. Once Siri has learned to dial the correct number when I say 'I am hungry' (Siri learned that I get food by calling a Pizza-service in the evening but a sushi restaurant at noon) it has obviously implemented lots of context that it extracted from me. In future implementations Siri might get this contextual input from my emails, from my twitter- , facebook, etc.
Scaringly this smart-app communicates with home-base - a reason why Sascha Lobo sees this as a new way for personal-data mining (besides facebooks info-hunger and googles data-logging).
But why not work with very local contextual applications that keep their electro-mouths shut to the outside and just optimize their preferences to suit my needs? It would be highly decentralized and still able to extract information from the fine-grained data desert.
Sincerely, Carsten.

Popular posts from this blog

Academics should be blogging? No.

"blogging is quite simply, one of the most important things that an academic should be doing right now" The London School of Economics and Political Science states in one of their, yes, Blogs . It is wrong. The arguments just seem so right: "faster communication of scientific results", "rapid interaction with colleagues" "responsibility to give back results to the public". All nice, all cuddly and warm, all good. But wrong. It might be true for scientoid babble. But this is not how science works.  Scientists usually follow scientific methods to obtain results. They devise, for example, experiments to measure a quantity while keeping the boundary-conditions in a defined range. They do discuss their aims, problems, techniques, preliminary results with colleagues - they talk about deviations and errors, successes and failures. But they don't do that wikipedia-style by asking anybody for an opinion . Scientific discussion needs a set

Information obesity? Don't swallow it!

Great - now they call it 'information obesity'! If you can name it, you know it. My favourite source of intellectual shallowness,, again wraps a whiff of nothing into a lengthy video-message. As if seeing a person read a text that barely covers up it's own emptyness makes it more valuable. More expensive to produce, sure. But valuable? It is ok, that Clay Johnson does everything to sell his book. But (why) is it necessary to waste so many words, spoken or written, to debate a perceived information overflow? Is it fighting fire with fire? It is cute to pack the problem of distractions into the metaphore of 'obesity', 'diet' and so on. But the solution is the same. At the core of every diet you have 'burn more than you eat'. If you cross a street, you don't read every licence-plate, you don't talk to everybody you encounter, you don't count the number of windows of the houses across, you don't interpret the sounds an

Driven by rotten Dinosaurs

My son is 15 years old. He asked me what a FAX-machine was. He get's the strange concept of CDs because there is a rack full with them next to the bookshelf, which contains tons of paper bound together in colorful bundles, called 'books'. He still accepts that some screens don't react to you punching your fingers on them. He repeatedly asks why my 'car' (he speaks the quotation marks) is powered by 'rotten dinosaurs'. At the same time he writes an email to Elon Musks Neuralink asking for an apprenticeship and sets up discord-servers for don't-ask-me-what. And slowly I am learning that it is a very good thing to be detached from historic technology, as you don't try to preserve an outdated concept while aiming to innovate. The optimized light-bulb would be an a wee bit more efficient, tiny light-bulb. But not a LED. An optimized FAX would probably handle paper differently - it would not be a file-transfer-system. Hyper-modern CDs might have tenf