Apr 27, 2010

The stuff in the web is not information - it is data

Thousands of blogs copy snippets from different sources, sometimes enrich them with comment (more often they don't), repost, redistribute, recycle. Twitter plugs up the net with autistic-looking short-messages and a seeming gazillion applications allow users to automatically cross-contaminate social networks with annoying status-messages. It is natural that many are  looking for ways to survive the 'information-tsunami' of the ever-growing web.
While filtering for keyphrases is the usual way out, David Gelernter sees hope in exchanging the axis along which the web-babble should be ordered: let's use the time axis (see "Time to start taking the internet seriously" on edge.org). Reminiscent of twitters lifestreams, information would visually flow from future over present to past letting the reader focus on everything in the timewindow she chooses. Aside from the big questionmark (why would such a reshuffling make lifestreams easier to bear?) there is a major misperception underlying all this visionary shabang: the stuff in the web is not information - it is data.


If we stare at all the data we get blind. If we focus on information the web is a much nicer place. Information arises from the analysis of data, from their connection, processing. Of course, a tweet can contain information ("I am going out now"), but the web's information-content is more powerful than that. Tweets on the weather could be accumulated to support weather-forecasts. Chatter on holiday-plans might help airlines organize their resources. Of course, this analysis is already going on - mainly to extract information for targeted marketing. But wouldn't it be nice if the future interface to the web is not a collection of access-paths to multicoloured 'social' networks or chatterboxes but a configurable data-analysis tool that helps pull out the real information? Travel-tips would not originate in some backoffice of an agency with clear commercial interest - they would be the result of your individual correlation of web-babble with weather-information and flight-prices, for example. Updated life. News could be ranked according to clickrates, or coverage, or resonance in a monitored corner of the net that you define. You name it.
Not only would the data-flood of the net be easier digestible, information would again be decoupled from the commercial interest of information-providers. The web could continue as the anarchic place  it once was - or it could at least pretend to.

2 comments:

S.R. said...

Hi Carsten, I looked around your site, and again, you managed to immediately attract my interest. :-)

And I agree with your stressing the "meaning of meaningfulness", with usable, or delighting etc. "information" - and "data" just being raw material (but too often also presented as information).

It's decisive in perception, I think, and that's very obvious within the context of an overload of data - or "empty signifiers" - in our net-in-revolution. It's like something ripe or complete enough in itself to be used by a recipient (piece of information) vs. some mere "material", staying without meaning or use in most cases (data side).

However, we must not forget that this difference is not only decisive for communication, but at the same time is widely individual. To make a pretty clear example, a piece of art, like a painting: While it may induce, arouse the imagination and its creative working of a recipient, leading to associations, the construction of meaning, producing emotions etc., the same painting may leave another person completely untouched, and even with the feeling of no meaning - just data, to use this term.

Many factors in communication are common, collective because both of the common (human) species and mainstreaming factors in family, societies, nations, regions or crossing "groupings".

But, little in studying maths, most in talking about the arts, the individual factor - or: the factorS - become important, and even may decide it all. And that's the problem, Carsten, with your idea of a "configurable data-analysis tool that helps pull out the real information":

(1) The "real" information you mention varies across individuals, cultures, even situative variables, and

(2) to design and also configure that data-analysis tool, you needed an immense flexibility and an extreme complex access to the net resources, both because of the complexity of (1).

(3) The problematic political question is: Who would be this actor or group to design, produce, and restrict such a huge approach which could be overwhelmingly powerful - would the net be the anarchic place then? Think also of the rising critical questions on personalisation (Google Search, Facebook, Amazon), where also connections are producing some information out of data, and which may pose a threat to openness, guiding automatically and perhaps too much...

I think higher complexity and/or aggregation, the tighter structure (or: order) would in contrast lead away from anarchy. I think what we have now is more or less an unruled sphere, more or less... and right the unruled sphere implies chaos and: often data, not information.

This internet is already the tool we can use by searching, social networks, rss etc. to connect and process data to information of a higher quality, and in a very individual way, much bottom-up, not top-down by a super-tool - albeit google is one, but an extreme wide one.

I wrote this with a big approach of yours in mind, but on a micro (or meso) level, in more specific ways, it's of course desirable to have and develop tools connecting the resources of the net and producing information of higher levels - social networks are one attempt, I think. But answering, I emphasize limits here.

In my opinion, the more important connections, combinations, framings, contextual decisions and processing of data, half-information and higher information will still be done personally, by our so complex, enough flexible, attentive, curious, and individual human minds.

Carsten Hucho said...

Sandor, your comment is very important - it definitely brings the discussion on a higher level, as you point at the crucial elements.
In my opinion, however, there would not be one 'superalgorithm' or central tool to extract information from data. It would be a very individually configured toolset - tailored to your personal needs or desires.
Sascha Lobo (www.saschalobo.com) made an important point in his (german) text on Siri, the speech-recognizing software of Apple (http://www.spiegel.de/netzwelt/web/0,1518,800533,00.html). As Siri learns to interpret my sentences (not only learning to transliterate the words that I say, but to put meaning in what I say) it has to understand the context. Once Siri has learned to dial the correct number when I say 'I am hungry' (Siri learned that I get food by calling a Pizza-service in the evening but a sushi restaurant at noon) it has obviously implemented lots of context that it extracted from me. In future implementations Siri might get this contextual input from my emails, from my twitter- , facebook, etc.
Scaringly this smart-app communicates with home-base - a reason why Sascha Lobo sees this as a new way for personal-data mining (besides facebooks info-hunger and googles data-logging).
But why not work with very local contextual applications that keep their electro-mouths shut to the outside and just optimize their preferences to suit my needs? It would be highly decentralized and still able to extract information from the fine-grained data desert.
Sincerely, Carsten.