A new scarcity

The World Wide Web (WWW) has grown exponentially since 1989, when its inventor, Tim Berners-Lee, created a system of hipertextual documents linked and accessible through the Internet. They were called web pages (or just webs) [1]. Between 1993 and 1995, the number of web servers (the computers that house websites) jumped from 130 to 22,000 [2]. Gulli and Signorini estimated that the Web had more than 11.5 billion pages in 2005 [3]. According to Internet Archive’s website (www.archive.org/about/faqs.php), its historical record of the web contains approximately one petabyte (1,024 terabytes) of data and is growing ate the rate of 20 terabytes per month [4].

Thanks to the great development the Web has experimented since its origin, some aspects of daily life have changed heavily, for example, personal communication, business or research. This revolution is transforming the world by leading it to Information Society. And it is still changing towards Knowledge Society and Knowledge Economy, where knowledge is considered the main asset of economy dinamics. For this reason, business and research will likely succeed if they manage projects based on knowledge/information.

Historians are taking advantage of current technology. Museums and archives are digitizing their material to both preserve and extend user access to cultural heritage content. Some of these projects are Project Gutenberg, Million Book Project, Internet Archive, Bibliotheca Alexandrina, Amazon, Google Books or Open Content Alliance [4]. Not so long ago, historians worried about the small numbers of people they could reach, pages of scholarship they could publish, primary sources they could introduce to their students, and documents that had survived from the past. Digital technology has removed many of these limits [5]. Now they are living a transition from scarcity to abundance.

Still, the astonishingly rapid accumulation of digital data (obvious to anyone who uses the Google search engine and gets 300,000 hits) should make us consider that future historians may face information overload [5]. Information overload, also called infoxication (information + intoxication), is not an issue that just concerns to archivists, librarians and journalists. Internet is able to intoxicate every single user with its huge amount of knowledge. Too much information is not always the best. In fact, it tends to generate confusion. I have myself lots of resources available when I need to search information: Google,Wikipedia, UWO Library Catalogue (alpha.lib.uwo.ca/), databases (SpringerLink, IEEE, etc) and so on. Sometimes, you do not know where to start.

Information overload is not a new concept. It has been a preoccupation since the Middle Ages [6]. Immanuel Kant warned that “pure information without selection criteria is blind. Francis Bacon and Karl Popper added that “Nature will be mute while we do not learn to make it talk with both relevant and purposeful questions” [7]. By the way, this latter cite reminds me certain similitude between Nature and how we have to behave in relation to a search engine in order to extract useful information from the Web.

The struggle to incorporate the possibilities of new technology into the ancient practice of history has led, most importantly, to questioning the basic goals and methods of historians’ craft. And they should continue taking steps individually and within their professional organizations to embrace the culture of abundance made possible by digital media [5]. However, such abundance of information can cause users to have difficulty finding relevant and interesting content [8]. We, computer scientists, are familiar with this problem. As far as I am concerned, meaning of scarcity is currently changing from “a lack of quantity to “a lack of quality. That is, “lack of useful content”. Historians need tools that help them deal with such abundance of information. It is necessary to find the float in the sea of knowledge, to find the needle in the haystack.

The main obstacle to provide better results to users is the Web itself. Its content is not undertandable by machines (only by humans). In other words, the Web does not incorporate mechanisms that allow automated processing of information. In order to overcome this problem, the solution more broadly supported all over the world is to represent the Web content in a formal way (processable by machines) and to use techniques based on Artificial Intelligence to take advantage of this sound representation. This plan to revolutionize the Web is called Semantic Web (semanticweb.org/wiki/Main_Page).

The Semantic Web is an attempt to enrich web pages so that machines can cooperate to perform inferences in the way that people do. The underlying idea is to give information a well-defined meaning specifically in order to enable interaction among machines [4]. When Tim Berners-Lee created the Web, his original idea was not the Web as we know today. His idea was the Semantic Web. He explains his own concept [1]:

“The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web, a web of data that can be processed directly or indirectly by machines.”

That is, information is organized in a way machines can interpret its meaning, like in a database. For example,

<?xml version=”1.0″ encoding=”ISO-8859-1″?>
<book>
<title> The Neverending Story </title>
<author> Michael Ende </author>
<year> 1979 </year>
</book>

A standard for describing books and other resources is the Dublin Core Metadata Standard (dublincore.org/). Information structured this way will enable questions such as “Who wrote The Neverending Story”. Notice that simple questions like this or like “Qué es La Historia Interminable” (in Spanish) can be currently answered by Google. In the future, every kind of question will be answered by semantic tools, no matter how complex it is.

Computer scientists are working hard to develop this technology and the tools that let researchers organize knowledge and extract it from structured information. There are lots of interesting applications. For instance, a tool for Medieval document XML markup. Its authors present a novel tool-suite supporting the working historian in the transcription of original medieval charters into a machine-readable form (XML) [9]. The Catalogus Professorum Lipsiensis is an application of an adaptive, semantics-based knowledge engineering approach for the development of a prosopographical knowledge base on the Web, which enable historians to collect, structure and publish prosopographical knowledge . The resulting knowledge base contains information about more than 14.000 entities and is tightly interlinked with the emerging Web of Data [10]. The Timeline tool (www.simile-widgets.org/timeline/) is basically an API for visualizing historic events. All you need is to mark up your data in XML [11]. Exhibit (www.simile-widgets.org/exhibit/) is a lightweight structured data publishing framework that lets you create web pages with support for sorting, filtering, and rich visualizations. The only web technology you need is HTML and, optionally, some CSS and Javascript code [11].

Historians should embrace these new tools to isolate the relevant data from the abundance, to find the needle in the haystack.

The needle in the haystack

The needle in the haystack

References
[1] T. Berners-Lee, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web, Harper, San Francisco, USA, 1999
[2] D. J. Cohen and R. Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web, University of Pennsylvania Press, Philadelphia, USA, 2006
[3] A. Gulli and A. Signorini, The Indexable Web is More than 11.5 Billion Pages, ACM, Chiba, Japan, 2005
[4] I. H. Witten, M. Gori and T. Numerico, Web Dragons: Inside the Myths of Search Engine Technology, Morgan Kaufmann, San Francisco, USA, 2007
[5] R. Rosenzweig, Scarcity or Abundance? Preserving the Past in a Digital Era, American Historical Review vol. 108 n. 3, pp. 735-762, 2003
[6] La Infoxicación en el Siglo XVI, 2007 [http://www.documentalistaenredado.net/495/la-infoxicacion-en-el-siglo-xvi/]
[7] X. Rubert de Ventós, La Red del Pescador, 2008 [http://www.elpais.com/articulo/opinion/red/pescador/elpepiopi/20080706elpepiopi_5/Tes]
[8] H. Cramer et al., The Effects of Transparency on Trust in and Acceptance of a Content-Based Art Recommender, User Model User-Adap Inter vol. 18 n. 5, pp. 455-496 , Springer, 2008.
[9] B. Burkard , G. Vogeler and S. Gruner, Informatics for Historians: Tools for Medieval Document XML Markup, and their Impact on the History-Sciences, Journal of Universal Computer Science vol. 14 n. 2, pp. 193-210 , 2007
[10] T. Riechert et al., Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis, ISWC’10 Proceedings of the 9th international semantic web conference on The semantic web vol. 2, Springer, 2010
[11] S. Fischer, History Museums and the Semantic Web, 2007 [http://publichistorian.wordpress.com/2007/01/16/history-museums-and-the-semantic-web/]

Posted in Uncategorized | 1 Comment

Neither field nor fad nor fashion

“Is Digital History a field, a fad or a fashion?” This was one of the topics argued in class.

As I read in Interchange: The Promise of Digital History, digital history was defined as “anything (research method, journal article, monograph, blog, classroom exercise) that uses digital technologies in creating, enhancing, or distributing historical research and scholarship”. “Technologies” refers to “technologies of the computer, the Internet network, and software systems”, as William G. Thomas pointed out.

Why is not digital history a field? Throughout history, technology has entered our lives. From the wheel to computers, technologies have become our lives easier. Particularly, inventions like the printing press or the Internet have revolutionize our History and the way (and speed) it is told. However, such inventions do not tell a new History but History. Technology has changed our world and our history, but they have not created new ones. In this sense, Internet and digital technologies must be seen as transversal tools in the service of existing disciplines.

The European Printing Press, 15th century

Why is not digital history a fad or a fashion? Daniel J. Cohen and Roy Rosenzweig assert: “The past was analog. The future is digital. Tomorrow’s historians will glory in a largely digital historical record, which will transform the way they research, present, and even preserve the past.” (Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web). I should say that future has already arrived. Alex de la Iglesia (former president of Spanish Film Academy) said in the speech he pronounced due to the 25th anniversary of Goya Awards (our “Spanish Oscar”) that “Internet no es el futuro, como algunos creen. Internet es el presente.” (“Internet is not the future, as some people believe. Internet is the present.”). I cannot imagine a world without electric light bulb or telephone. Obviously, when something works, it lasts forever.

So what is digital history? It is just History when it “makes use of sources in digital form” (my teacher, William J. Turkel).

Posted in Uncategorized | 1 Comment

“What is real?”

That was the question a classmate asked in class two weeks ago. Blown to Bits use expressions like “real life” and “cyberspace” to differenciate between our everyday life and our life on the Internet. But, is there any difference between these two worlds?

Cyberspace is not a dream. Internet is composed by millions of computers interconnected. And, of course, they are reals. Well-known ones and zeros are electrical signals that are transmitted by cables or waves. It seems obvious because all these things are physical objects. But what happens “inside the Internet”, beyond the material? Let’s see some examples:

  • Messenger (and similars) allow users to be connected. They can chat and set up videocalls. Friends are sometimes far away from each other and this is one possible way of keeping in touch. Is this a fake communication? I don’t think so.
  • Lots of lonley souls have met on chats or specialized webs and now they are happy couples. First time they met face to face, they already knew each other in a certain sense.
  • Facebook let users share photos and comment them. Friends plan events that come true in “real life”. Even there are people who claim that “if it is not on Facebook, it never happend”.
  • Second Life is a project where users can create a customized avatar which lives in a “virtual world”. But this world is not as virtual as it looks. People can interact like in real world. They can socialize and trade. Companies such as Sony, Coca Cola or Microsoft (among many others) have set up business and they insert ads everywhere. US dollars can be interchanged in virtual dollars.
  • A lot of virtual shops sell books, clothes, videogames… and the purchased articles arrive our homes! Check your bank account and realize the cruel reality.

These examples prove that cyberspace is not a fantasy. It is as real as “real life”. Then, why is there this artificial difference between the two realities? The key is survival. Virtual world is dispensable: you do not need chating, buy online or public a photo (although you will stay behind because life is, in fact, digital). Unlike cyberspace, we have to survive in our daily lives: you have to study, you have to work, you have to earn money, you have to eat, you need human contact…

These facts make this life become the real life, the life that has always existed. But, nowadays, what is real? Everything.

Posted in Uncategorized | 1 Comment

My first post

This is my first post!

Posted in Uncategorized | Leave a comment