"HTML Tidy for MkLinux (vers 1 September 2005), see www.w3.org" />
Animals that Belong to the Emperor
fileclose0x100d1138
Animals that Belong to the Emperor
Failing universal classification schemes from the
Renaissance to the Semantic Web
Florian Cramer
The weapon with which European search engines intend to
beat Google is semantic information processing: pattern
recognition in media in the case of Quaero, Semantic Web
technology in Theseus, its German cousin. Originally, the Quaero
project was a French-German collaboration, funded by both
governments, until the German Theseus project split off from
Quaero to pursue its own vision of future Web search. This vision
is twofold, involving a number of classical holy grails of
computer science: (a) provide search on the basis of semantic
tags, (b) have software recognize the contents of web pages in
order to automatically apply those tags. While the second point
is utopian enough and something that Artificial Intelligence
research hasn't achieved in decades, I would like to show why
even point (a), in other words: the Semantic Web, is doomed to
fail. (I leave it to the audience to draw its own conclusions why
the Theseus projects receives high public funding nevertheless.)
The "Semantic Web" itself is a highly misunderstandable
term and project. It was founded and is pursued by Tim
Berners-Lee, the creator of the World Wide Web in the early
1990s. In 2004, prior to Quaero and Theseus, the German federal
government subsidized research on the Semantic Web with 13.7
million Euros, reasoning that as a "semantic technology", it
would allow people to phrase search terms as normal questions,
thus giving computer illiterates easier access to the Internet.
But, alas, this is not at all what the Semantic Web is about, or
what it would even implement; it was, in another words, a 13.7
million Euro misunderstanding.1 Indeed, natural
language parsing is another holy grail of search engine
development, from "Ask Jeeves" (which renamed itself Ask.com, and
deemphasized its initial concept) to ..., recently brought up by
Geert Lovink on the Nettime mailing list. Natural language
parsing falls into the category b I mentioned before, as the Holy
Grail that artificial intelligence research for couple of decase
has consistently claimed to have almost, but just not quite
reached - while critical A.I. researches like Luc Steels say that
it won't ever be reachable with current algorithmic computer
architectures, regardless their speed. And in reality, natural
language search systems are no more than inefficient interface
wrappers around classical Boolean search expressions with logical
AND, OR and NOT operators.
The Semantic Web does not fall into this trap because it
does not involve any automatic interpretation of meaning.
Berners-Lee is quite outspoken on this, saying that his concept
"does not imply some magical artificial intelligence which allows
machines to comprehend human mumblings" - much in contradiction
to Quaero and Theseus. Instead, his Semantic Web is a universal
markup or "tagging" schemes. In Berners-Lee's words: "Instead of
asking machines to understand people's language, it involves
asking people to make the extra effort". This effort, semantic
tagging is well-known and a popular device on sites like Flickr,
digg.com and delicious. It simply means that users attach
keywords to texts, images and other information, so that this
information can be searched by its keywords or particular keyword
combinations. On Flickr, for example, the search keyword
combination "birthday", "children" and "clown" results in a list
of pictures of clowns appearing at children's birthday parties,
not because of any Quaero-style computer recognition of the image
contents, but because of the keywords assigned to images by
Flickr users.
From a Semantic Web perspective, this system is flawed
though, because there are no nomenclatures for tagging. For
example, a user might have tagged an image with "kids" instead of
"children", so it won't turn up in the results. And the tags lack
systematization: for example, children could be classified as a
subset of humans, humans as a subset of mammals; birthdays as a
subset of celebrations etc.etc. Then one would also find pictures
marked up with "birthday" and "children" in a more general search
for pictures of human celebrations. This is why unsystematic,
ad-hoc, user-generated and site-specific tagging systems like on
Flickr are referred to as "folksonomies".2
The Semantic Web promises to overcome those folksonomies
with one, unified and standardized keyword tagging system that
can applied to anything. In other words, it is a universal
classificatory description system, a grand unified hierarchical
metatag design.
For somewhat mysterious or at least idiosyncratic reasons,
Berners-Lee calls this classifactory system an "ontology", making
his project particularly confusing for people with backgrounds in
philosophy and humanities - because it is not an ontology, but a
cosmology.
Just as cosmologies are by no means new, so are universal
classification and tagging systems of the world. In his essay and
short-story "The Analytical Language of John Wilkins", Jorge Luis
Borges writes about the English 17th century scholar that
"He divided the universe in forty categories or classes, these
being further subdivided into differences, which was then
subdivided into species. He assigned to each class a
monosyllable of two letters; to each difference, a consonant;
to each species, a vowel. For example: de, which means an
element; deb, the first of the elements, fire; deba, a part of
the element fire, a flame." [...]
Similar classification schemes have been developed
throughout the Middle Ages and Renaissance, by Ramon Llull,
Giordano Bruno, and especially in 17th century encyclopedism of
Johann Heinrich Alsted and Jan Amos Comenius in whose tradition
Wilkins works and thinks. Since encyclopedias, before Diderot and
d'Alembert, structured their knowledge systematically, not
alphabetically, they developed increasingly complex tree-like
classification systems of all things in the world as described in
them. [picture] The so-called "ontology" of the Semantic Web does
not only do something similar, but it does exactly the same
again.
The Renaissance classificatory cosmologies could only work
on the basis of a stable assumption of what the world is and how
it is structured: for example, by the four directions, the four
seasons, the temperaments, the seven virtues and vices etc. In
other words, they were still embedded into the paradigm of
Medieval scholasticist science which in turn was derived from
Aristotle's system of categories that broke up all things in the
world into genres and species. The Semantic Web boils down to
nothing else but technocratic neo-scholasticism, and a
questionable if not dangerous belief that the world can be
described according to a single, objective, universally valid
viewpoint and classification - a blatant example of an engineer's
blindness to ambiguity and cultural issues.
Although there was no Semantic Web yet in the 1940s, Borges
pins down the issue in his essay. One is tempted to just replace
the name John Wilkins with Tim Berners-Lee when he examines the
former's categories to find out that stones, for example, are
absurdly classified as either common, or modic, precious,
transparent and insoluble, or that beauty is assigned to a
"living brood fish". Borges' concludes that
"These ambiguities, redundancies and deficiencies remind us of
those which doctor Franz Kuhn attributes to a certain Chinese
encyclopaedia entitled 'Celestial Empire of benevolent
Knowledge'. In its remote pages it is written that the animals
are divided into: (a) belonging to the emperor, (b) embalmed,
(c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray
dogs, (h) included in the present classification, (i) frenzied,
(j) innumerable, (k) drawn with a very fine camelhair brush,
(l) et cetera, (m) having just broken the water pitcher, (n)
that from a long way off look like flies."
Although this is Borges' own fiction, it nevertheless
reveals the arbitrariness of categories and classifications. It
also had a thorough impact as a philosophical critique. The first
chapter of Michel Foucault's "The Order of Things" is a
discussion of precisely the above list of animals. Foucault
confesses that for him, it "shattered all the familiar landmarks
of his thought", opening his eyes on how the order of knowledge
is culturally constructed, and may be thought differently. To
understand Foucault's discourse theory, in order words, one just
needs to read Borges.
But the order of things, and unified classification
schemes, do not just break down in fiction. Sticking to the
example of animals, it is obvious how Aristotelian philosophy
continues to exist today, in the notion of gender and species
[and even more questionably in the categorization of humans into
biological races], we're still working within the paradigm
introduced by Aristotle's zoology. But the problem is that does
it not always work even in zoology itself. The prime example is
the platypus, an Australian animal that is a breastfeeding
mammal, but it lays eggs, lives in the water and has a brill like
a bird. If the platypus breaks genre and species classification
of zoology, where does it fit the Semantic Web?
This situation is by no means new. In his book "Kant and
the Platypus", Umberto Eco makes the animal a symbol for
scholastic versus empirical science. A bit confusingly, he
differentiates "cultural cases" - that means categorically
defined phenomena - from "empirical cases", i.e. phenomena that
are observed instead of predefined. "To be recognized as such,"
Eco states, cultural cases "need reference to a framework of
cultural norms" (Eco 1997, p. 139). For Eco as a semiotician,
this means that Being, or existence, is the frontier that
systematic science cannot conquer. And this is what ontology
means.
The innovation of modern science since Galileo, Newton and
Descartes is that it operates without the reference to those
norms. When Diderot and d'Alembert abandoned the old
classificatory order of knowledge in encyclopedias and replaced
them with a non-classificatory, non-systematic alphabetic order,
they precisely followed the empirical paradigm, taking phenomena
as they occurred and not as they fit. In other words: It was the
innovation of modern critical science that it gave up "Semantic
Web" schemes in its ordering of knowledge.
But to go back from academic discourse to folksonomies on
the Internet, an even better example than the Platypus was
brought up in a Web forum of the computer news site heise.de.
Discussing the Semantic Web and its classification scheme, an
anonymous poster brought up the hypothetical example "A Muslim is
a potential terrorist" in order to show that a unified "ontology"
cannot be built.3 And this example scratches only the
surface of the pending cultural problems. In other words, not the
empirical, but the cultural cases bear the actual dynamite. The
whole Semantic Web, and the search engines built upon it, rest on
the illusion that there can be one objective assessment of the
world. This is not only cosmology falsely named ontology, but
also metaphysics disguised as physics.
On top of that, it is relying on the illusion of a culture
where semantic tags wouldn't simply be used for spamming and
search engine manipulation, which is why Google already ignores
metatags. And while Berners-Lee is a realist enough to assess
that tagging cannot be done by bots like those planned by the
Theseus project, his Semantic Web consequently implies a
complexity nightmare of meta information overtaking information,
i.e. where each piece of information would create at least twice
as much work for its semantic markup than for its original
creation.
It would be good if creators of so-called next-generation
search engines would read up on Borges who concludes:
"I have registered the arbitrarities of Wilkins, [and] of the
unknown (or false) Chinese encyclopaedia writer [...]; it is
clear that there is no classification of the Universe not being
arbitrary and full of conjectures. The reason for this is very
simple: we do not know what thing the universe is."
Footnotes:
1Ich hab irgendwie den Eindruck dass
unser Bundesforschungsministerium in der irrigen Annahme ist, das
13 Millionen Euro eine Software schaffen die es jedem
Computer-Analphabeten ermöglicht, ganz ohne den
"Extra Effort" seine "Pisa-Versagen vermarkten und als
hochinnovative Rettung des Wissens- und Wirtschaftsstandorts
Deutschland (wers glaubt ... ),
2Wikipedia: Folksonomy (also known
as collaborative tagging , social classification, social
indexing, social tagging, and other names) is the practice and
method of collaboratively creating and managing tags to annotate
and categorize Content. In contrast to traditional subject
indexing, metadata is not only generated by experts but also by
creators and consumers of the content. Usually freely chosen
keywords are used instead of a controlled vocabulary."
3URL