Re-Engineering Libraries: CTWatch Quarterly

CTWatch Quarterly » Web 2.0 in Science

Web 2.0 in Science

Timo Hannay, Nature Publishing

August 2007

What is Web 2.0?

Perhaps the only thing on which everyone can agree about Web 2.0 is that it has become a potent buzzword. It provokes enthusiasm and cynicism in roughly equal measures, but as a label for an idea whose time has come, no one can seriously doubt its influence.

So what does it mean? Web 2.0 began as a conference,[1] first hosted in October 2004 by O’Reilly Media and CMP Media. Following the boom-bust cycle that ended in the dot-com crash of 2001, the organisers wanted to refocus attention on individual web success stories and the growing influence of the web as a whole. True, during the late 1990s hype and expectations had run ahead of reality, but that did not mean that the reality was not epochal and world-changing. By the following year, Tim O’Reilly, founder of the eponymous firm and principal articulator of the Web 2.0 vision, had laid down in a seminal essay[2] a set of observations about approaches that work particularly well in the online world. These included:

“The web as a platform”

The Long Tail (e.g., Amazon)

Trust systems and emergent data (e.g., eBay)

AJAX (e.g., Google Maps)

Tagging (e.g., del.icio.us)

Peer-to-peer technologies (e.g., Skype)

Open APIs and ‘mashups’ (e.g., Flickr)

“Data as the new ‘Intel Inside’” (e.g., cartographical data from MapQuest)

Software as a service (e.g., Salesforce.com)

Architectures of participation (e.g., Wikipedia)

The sheer range and variety of these concepts led some to criticize the idea of Web 2.0 as too ill-defined to be useful. Others have pointed out (correctly) that some of these principles are not new but date back to the beginning of the web itself, even if they have only now reached the mainstream. But it is precisely in raising awareness of these concepts that the Web 2.0 meme has delivered most value. Now, those of us without the genius of Jeff Bezos or Larry Page can begin to glimpse what the web truly has to offer and, notwithstanding the overblown hype of the late 1990s, how it really is changing the world before our eyes.

Initially the first item in the list above – the web as platform – seemed to have primacy among the loose collection of ideas that constituted Web 2.0 (see, for example, Figure 1 in [2]). The most important thing seemed to be that talent and enthusiasm in software development was migrating from traditional operating system platforms to the web. New applications were agnostic with respect to Unix versus Macintosh versus Windows and were instead designed to operate using web protocols (specifically, HTTP and HTML) regardless of the precise underlying software running on the server or client machines.

However, this view taken on its own overlooks one very important reason why that migration has happened: the web is more powerful than the platforms that preceded it because it is an open network and lends itself particularly well to applications that enable collaboration and communication. With his usual eye for pithy phrasing, Tim O’Reilly described this aspect using the terms “architecture of participation”[3] and “harnessing collective intelligence.”[2] He pointed out that the most successful web applications use the network on which they are built to produce their own network effects, sometimes creating apparently unstoppable momentum. This is how a whole new economy can arise in the form of eBay, why tiny craigslist and Wikipedia can take on the might of mainstream media and reference publishing, and why Google can produce the best search results by surreptitiously recruiting every creator of a web link to its cause. In time, this participative aspect came to the fore, and these days “Web 2.0″ is often seen as synonymous with websites that do not merely serve users but also involve them, thus enabling them to achieve that most desirable of business goals: a service that gets better for everyone the more people use it.

This brief survey will use a relatively broad definition of Web 2.0. So, while it will deal mainly with participative services and network effects, it will also cover certain other aspects of the original Web 2.0 vision that have particular relevance in science, including mashups and tagging.

Social software

If a cornerstone of the Web 2.0 meme is the web as a global, collaborative environment, how is this being put to use in perhaps the most global and collaborative of all human endeavors: scientific research? An irony often observed by those of us working in science communication is the fact that, although the web was originally invented as means for sharing scientific information,[4] scientists have been relatively slow to fully embrace its potential. Blogging, for example, has become undeniably mainstream, with the number of bloggers somewhere in the high tens of millions[5] (among a billion or so web users[6]). Yet among a few million scientists worldwide, only perhaps one or two thousand are blogging, at least about science,[7][8] and most of these are relatively young. By contrast, academic economists,[9] for example, even very distinguished ones, seem to have embraced this new medium more enthusiastically.

Scientific blogging is still a niche activity, and what data there are suggest that it is not yet growing fast. For example, Alexa reports[10] that ScienceBlogs,[11] where many of the most prominent scientist-bloggers post their thoughts, has shown little traffic growth over the last twelve months, and the scientific blog tracking service Postgenomic.com[12] (created by an employee of Nature Publishing Group) shows the volume of posts from the blog in its index holding still at about 2,500 posts a week.[13] Similarly, scientists appear reluctant to comment publicly on research papers.[14][15] The blogging bug, it seems, has yet to penetrate the scientific citadel. This is a shame because blogs are a particularly effective means for one-to-many and many-to-many communication, and science no less than other spheres stands to gain from its judicious adoption.

Yet the participative web is about much more than blogging and commenting. Figure 1 below summarizes the manifold types of social software that exist online, all of them relevant in some way to scientific research.

Figure 1. Categories of social software.

Wikis: These have existed since the mid-1990s,[16] but it took the astonishing rise of Wikipedia during the middle part of this decade for the potential of wikis to become widely appreciated. We can now see numerous examples of scientific wikis, from collaborative cataloguing and annotation projects like WikiSpecies [17] and Proteins Wiki [18] to open laboratory notebooks like OpenWetWare [19] and UsefulChem.[20] These all represent sensible uses of wikis, which are best employed to enable groups of geographically dispersed people to collaborate in the creation of a communal document with an identifiable objective aim (as in Wikipedia, WikiSpecies and Proteins Wiki), or to allow individuals or small, real-world teams to share freeform information with others around the world (as in OpenWetWare and UsefulChem). In contrast, experiments at the Los Angeles Times [21] and Penguin Books [22] have demonstrated that wikis are not well suited to the creation of opinioned or fictional content – because the end goal cannot possibly be shared by all contributors at the outset. A particularly interesting recent development has been the launch of Freebase,[23] the latest brainchild of parallel computing pioneer and polymath Danny Hillis. This takes a wiki-like approach to open contributions, but provides an underlying data model more akin to relational databases and the Semantic Web,[24] allowing specific relationships between entities to be expressed and queried. Whilst Freebase is not aimed mainly at scientists, scientific topics are among those covered. It will be interesting to see how this approach fares over the less technically sophisticated but arguably less restrictive approach represented by traditional wikis.

Voting: Slashdot [25] and more recently digg [26] have become staple information sources for computer nerds and web geeks everywhere. Their traffic, which ranks them among the top media organisations on the planet,[27] belies their meager staff numbers (which, compared to a daily newspaper’s, are as near to zero as makes no difference). Like all good Web 2.0 sites, they exert their influence by getting readers to contribute: in this case by providing stories, links and comments – then other users to decide what’s most interesting by casting votes. In the case of digg, the users even decide which stories get elevated to the front page. Such sites, like search engines, are sometimes criticized for being parasitical on the mainstream media stories to which they link (after all, they generate no content, only link to it). But this is to misunderstand the value they add, which is to help people decide where to direct their scarce attention in an age of often oppressive information overload. They are no more parasitical on journalism than journalism is on the newsmakers themselves (after all, journalists don’t make the news, only report it – well, most of the time). Yet these services do have a very different feel to those in which the content is selected by an editor, and the optimum approach in some cases may be to marry the ‘wisdom of crowds’ (to highlight interesting stories) with professional editorial expertise (to provide a final selection and put these items in context). These systems are also vulnerable to the ‘tyranny of the majority’ and to cynical gaming, so even while they save on traditional editorial staff, the operators of these sites do face other challenges in maintaining a useful service.

Of course, similar problems of information overload apply in science, so it is natural to ask whether it is possible to use these approaches to help scientists to help themselves. Sure enough, sites like ChemRank,[28] SciRate [29] and BioWizard [30] have appeared. Nature Publishing Group has a few of its own experiments in this area, including: DissectMedicine,[31] a collaborative news system for medics; Nature China,[32] which includes summaries of the best Chinese research as submitted and voted on by readers; and Scintilla,[33] a scientific information aggregation and personalization tool that employs user ratings in its recommendation algorithms. It is too early to say which of these scientific applications will prevail, but given the demonstrable success of this approach outside science, it seems almost inevitable that some of them will.

File sharing: This is one of those rare areas in which scientists – or at least some of them – have blazed a trail well ahead of the mainstream. Physicists (and a few others) have been sharing preprints (unpeer-reviewed manuscripts) through the arXiv.org server [34] since 1991 (and even before that, they shared their findings with each other by email or post). Now, the web is replete with ways of sharing various types of content, from documents [35] to videos [36] to slides.[37] And scientific services, too, have begun to diversify, from Nature Precedings,[38] a preprint server and document-sharing service for those outside physics, and the Journal of Visualized Experiments,[39] a way for scientists to share videos of experimental protocols.

Social networks: Perhaps the most obviously social of all social software are those that enable the creation of personal networks of friends and other like-minded people. The use of services like MySpace [40] and Facebook [41] has become almost ubiquitous among young people in many countries. The average age of users is now starting to grow as they break away from their core teenage and college student markets.[42] Meanwhile, LinkedIn [43] has become a favourite networking tool among business people. Once again, medics and scientists are following the mainstream with sites like Sermo [44] for clinicians and Nature Network for scientists.[45] These environments are not only for finding and contacting new people with shared interests (though they are good for that too and therefore have potential in everything from job-seeking to dating), they also enable the creation of discussion groups and allow users to efficiently follow the activities (e.g., in terms of posts and comments) of others whose views they find interesting. Correctly implemented and used, these services therefore have great potential to make scientific discourse more immediate, efficient and open. A major unanswered question, however, is the interoperability and openness of the services themselves. No one wants to have to register separately on multiple different sites or lock up their details in a system over which they have no control. Federated authentication technologies like OpenID [46] and other approaches to interoperability hold promise, but it remains to be seen how enthusiastically they will be embraced by the operators of social networking services, and how receptive they will be to the idea of partial cooperation rather than outright competition.

Classified advertising: This may seem like a strange category to include here, but newspaper small ads are arguably the original grassroots participative publishing service. It is perhaps no coincidence, then, that they have been among the first areas of traditional publishing to fall victim to lean and radical Web 2.0 startups, most famously craigslist.[47] Particularly among careers services, there is also keen competition to turn simple ads services into social networks, as epitomized by Jobster,[48] and the distinction between social networks and career services is only likely to blur further. Though some very large employers, notably Britain’s National Health Service [49] have established their own online jobs boards, effectively disintermediating their former advertising outlets, this revolution has yet to hit the medical and scientific advertising realm with full force. One early sign of the changes to come was the switch by NatureJobs in late 2006 from an arrangement in which online ads were sold as part of a portfolio of products to a ‘freemium’ model [50] in which simple online listings are provided free and other services such as rich, targeted or print advertisements are sold as add-ons. This reflects the different economics of operating online, where the marginal cost to serve an extra advertiser is low and the benefits of providing a single comprehensive jobs database high.

Markets: EBay [51] is in some ways the definitive Web 2.0 company: it is a pure market in which the company itself does not own any of the goods being traded. Similarly, other services, such as Elance [52] specialize in matching skilled workers to employees with projects that they wish to outsource. In the scientific space, the online trading of physical goods (such as used laboratory equipment) is not yet commonplace, though it might become so in the future. In contrast, the matching of highly trained people to problems does have some traction in the form of ‘knowledge markets’ such as InnoCentive.[53] These are still at an early stage, and they are mostly used by commercial organisations such as pharmaceutical companies, but it is not hard to imagine academic research groups doing the same one day if (as it should) this approach enables them to achieve their goals more quickly and at lower cost.

Virtual worlds: By far the most prominent virtual world is Second Life [54] (though others such as There.com exist too). What sets it apart from online role-playing games like World of Warcraft (which are orders of magnitude more popular) are the facts that they do not have predefined storylines or goals and that they give their users freedom to create and use almost whatever objects they choose, possibly making money in the process. In this sense, they represent a genuine alternative to the real world. Pedants might argue that Second Life is not really a Web 2.0 service because it is not technically part of the web (i.e., it does not use HTML and HTTP, though it can interact with the web in various ways). But at a more abstract level, the participative, user-generated environments that have grown inside Second Life are as good examples as exist anywhere of the ‘architecture of participation’ principle. The greatest scientific potential seems to lie in education and in conferences. Second Life provides an environment in which people from different locations can come together quickly and easily into a shared space that to some extent mimics the real world in important aspects of human communication such as physical proximity, gesture and the ability to seamlessly mix one-to-many with one-to-one communication (e.g., chatting to the person beside you during a lecture). As a result, educators have poured in – around 160 universities now have a presence in Second Life [55] – as have some scientists. There is even a continent called the SciLands where a number of groups with scientific interests have congregated (though from a distance, and to my eyes, its administration appears dauntingly bureaucratic). Nature Publishing Group also has it’s own small archipelago – inevitably called Second Nature and consisting (at the time of this writing) of three separate islands – on which a diverse group of scientists is building and maintaining educational features in evolutionary biology, genetics, cell biology, chemistry and earth sciences, among others. There are also meeting, presentation and poster display areas. The degree of activity and enthusiasm has, quite frankly, astonished us. True, Second Life and other virtual worlds are still at an early stage in their evolution, are clunky to use, and require large doses of patience and practice to get the most out of them. But the same was true of the web during the early 1990s and look what happened there. One major factor working against Second Life’s rapid expansion is the fact that it is a proprietary ‘walled garden’ controlled by a single commercial organisation, Linden Lab. In this sense, it is more like early AOL than the early web. But conversations with staff at Linden Lab suggest that they understand this potential pitfall, and they have already released, as open source, the code to their client application.[56] If the server side code is opened up too, then the eventual results could be as momentous and world-changing as the web itself.

Tagging and folksonomies

One class of social software that deserves special comment is social bookmarking tools.[57] One of the earliest was del.icio.us,[58] and its introduction of tagging – freeform keywords entered by users to facilitate later retrieval – soon gave rise to the concept of the ‘folksonomy,’ [59] a kind of implicit collective taxonomy or ontology generated by the aggregate, uncoordinated activity of many people tagging the same resources. Some commentators, notably Clay Shirky [60] and David Weinberger,[61] have argued (convincingly in my opinion) that this approach, although anarchic, has certain advantages over traditional centralized taxonomic approaches (such as the Dewey Decimal System). In particular, traditional approaches have difficulty dealing with entities that belong in multiple categories (is Nature a magazine or a journal?), or about which our view changes over time (Watson & Crick’s 1953 paper reporting the structure of DNA is in the field of biotechnology, but that word did not exist at the time). Since such challenges are often particularly acute in science, which necessarily operates at the frontiers of human knowledge, it is tempting to wonder whether collaborative tagging can help in that domain too.[62]

Nature Publishing Group has its own social bookmarking and reference management tool, Connotea,[63] heavily inspired by del.icio.us but with certain features added with academic researchers in mind. As well as providing a way for researchers to store, organise and share their reading lists, we were also interested to find out how useful the resultant collective tag metadata could be in helping to automatically link together related online scientific resources. To that end, we developed code for the EPrints institutional repository software [64] that enabled it to query Connotea for tag information and automatically derived related reading suggestions. The experiment proved a success [65] and we have built tagging into many of the applications we have developed since then (e.g., Nature Network, Nature Precedings and Scintilla) with a view to implementing similar features when the data sets grow large enough.

Open data and mashups

Another area with huge potential – but one that I have space to deal with only cursorily here – is that of open scientific data sets and forms of interoperability that allow these to be transferred not only between scientists but also between applications in order to create new visualizations and other useful transformations. There are numerous challenges, but there is also progress to report on each front. Too often scientists are unwilling to share data, whether for competitive or other reasons, though increasingly funders (and some publishers) are requiring them to do so. Even when the data are available, they usually lack the consistent formats and unambiguous metadata that would enable them to be efficiently imported into a new application and correctly interpreted by a researcher who was not present when they were collected. Yet data standards such as CML [66] and SBML [67] are emerging, as are metadata standards such as MIAME.[68] As software applications also adopt these standards, we enter a virtuous circle in which there are increasing returns (at least at the global level) to openly sharing data using common standards.

For a glimpse of the benefits this can bring, witness the work of my colleague, Declan Butler, a journalist at Nature. While covering the subject of avian flu, it came to his attention that information about global outbreaks was fragmented, incompatible, and often confidential. So he took it upon himself to gather what data he could, merge it together and provide it in the form of a KML file, the data format used by Google Earth.[69] Shortly afterwards he overlaid poultry density data.[70] This not only meant the information was now available in one place, it also made it much more readily comprehensible to experts and non-experts alike. Imagine the benefits if this approach, largely the work of one man, was replicated across all of science.

Wither the scientific web?

Over the last 10 years or so, much of the discussion about the impact of the web on science – particularly among publishers – has been about the way in which it will change scientific journals. Sure enough, these have migrated online with huge commensurate improvements in accessibility and utility. For all but a very small number of widely read titles, the day of the print journal seems to be almost over. Yet to see this development as the major impact of the web on science would be extremely narrow-minded – equivalent to viewing the web primarily as an efficient PDF distribution network. Though it will take longer to have its full effect, the web’s major impact will be on the way that science itself is practiced.

The barriers to full-scale adoption are not only (or even mainly) technical, but rather social and psychological. This makes the timings almost impossible to predict, but the long-term trends are already unmistakable: greater specialization in research, more immediate and open information-sharing, a reduction in the size of the ‘minimum publishable unit,’ productivity measures that look beyond journal publication records, a blurring of the boundaries between journals and databases, reinventions of the roles of publishers and editors, greater use of audio and video, more virtual meetings. And most important of all, arising from this gradual but inevitable embracement of technology, an increase in rate at which new discoveries are made and exploited for our benefit and that of the world we inhabit.

¹Now called the Web 2.0 Summit – http://www.web2summit.com/
²O'Reilly, T. "What Is Web 2.0.?" http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html, 2005.
³O'Reilly, T. Architecture of Participation. http://www.oreillynet.com/pub/wlg/3017/, 2003.
⁴Berners-Lee, T. "Weaving the Web," Texere (London), 2000.
⁵Sifry, D. "The State of the Live Web," http://www.sifry.com/alerts/archives/000493.html, 2007.
⁶Nielsen, J. A Billion Internet Users. http://www.useit.com/alertbox/internet_growth.html, 2005.
⁷Nature Publishing Group estimate.
⁸The science blog aggregator, Postgenomic.com lists 735 blogs in its index but certainly misses some.
⁹Anonymous. "The invisible hand on the keyboard," The Economist 3rd August, 2006.
¹⁰Alexa – http://www.alexa.com/
¹¹ScienceBlogs – http://www.scienceblogs.com/
¹²Postgenomic.com – http://www.postgenomic.com/
¹³Postgenomic.com zeitgeist – http://www.postgenomic.com/stats.php
¹⁴Anonymous. "Overview: Nature's peer review trial," http://www.nature.com/nature/peerreview/debate/nature05535.html, 2006.
¹⁵Liu, S.V. ”Why are people reluctant to join in open review?” Nature, Vol. 447, pp. 1052,.2007.
¹⁶Leuf, B. and Cunningham, W. "The Wiki Way," Addison-Wesley, 2001.
¹⁷WikiSpecies – http://species.wikimedia.org/wiki/Main_Page
¹⁸Proteins Wiki – http://proteins.wikia.com/wiki/Main_Page
¹⁹OpenWetWare – http://openwetware.org/
²⁰UsefulChem - http://usefulchem.wikispaces.com/
²¹Anonymous. "Los Angeles Times Suspends 'Wikitorials'," Associated Press, http://www.msnbc.msn.com/id/8300420/, 2005.
²²A Million Penguins – http://www.amillionpenguins.com/wiki/index.php/Main_Page
²³Freebase - http://www.freebase.com/
²⁴Berners-Lee, T., Hendler, J., Lassila, O. "The Semantic Web," Scientific American, 2001.
²⁵Slashdot – http://slashdot.org/
²⁶digg - http://digg.com/
²⁷See the following comparison on Alexa: http://tinyurl.com/2faroc
²⁸ChemRank – http://www.chemrank.com/
²⁹SciRate – http://scirate.com/
³⁰BioWizard – http://www.biowizard.com/
³¹DissectMedicine – http://www.dissectmedicine.com/
³²Nature China – http://www.natureasia.com/ch/
³³Scintilla – http://scintilla.nature.com/
³⁴arXiv.org – http://arxiv.org/
³⁵Scribd – http://www.scribd.com/
³⁶YouTube – http://www.youtube.com/
³⁷SlideShare – http://www.slideshare.net/
³⁸Nature Precedings – http://precedings.nature.com/
³⁹Journal of Visualized Experiments – http://www.jove.com/
⁴⁰MySpace – http://www.myspace.com/
⁴¹Facebook – http://www.facebook.com/
⁴²Gonzalez, N. "Facebook Users Up 89% Over Last Year; Demographic Shift," TechCrunch, 2007.
⁴³LinkedIn – http://www.linkedin.com/
⁴⁴Sermo – http://www.sermo.com/
⁴⁵Nature Network – http://network.nature.com/
⁴⁶OpenID – http://openid.net/
⁴⁷craigslist – http://sfbay.craigslist.org/
⁴⁸Jobster – http://www.jobster.com/
⁴⁹NHS Jobs – http://www.jobs.nhs.uk/
⁵⁰Wilson, F. "The Freemium Business Model," http://avc.blogs.com/a_vc/2006/03/the_freemium_bu.html, 2006.
⁵¹EBay – http://www.ebay.com/
⁵²Elance – http://www.elance.com/
⁵³InnoCentive – http://www.innocentive.com/
⁵⁴Second Life – http://www.secondlife.com/
⁵⁵Nature Publishing Group estimate.
⁵⁶Linden, P. "Embracing the Inevitable," http://blog.secondlife.com/2007/01/08/embracing-the-inevitable/, 2007.
⁵⁷Hammond, T., Hannay, T., Lund, B., Scott, J. “Social Bookmarking Tools (I): A General Review,” D-Lib, Vol. 11, no. 4, 2005.
⁵⁸del.icio.us – http://del.icio.us/
⁵⁹Vander Wal, T. Folksonomy. http://www.vanderwal.net/folksonomy.html, 2007.
⁶⁰Shirky, C. "Ontology Is Overrated," http://www.shirky.com/writings/ontology_overrated.html, 2005.
⁶¹Weinberger, D. "Everything is Miscellaneous," Times Books, 2007.
⁶²Hannay, T. "Introduction," http://tagsonomy.com/index.php/introduction-timo-hannay/, 2005.
⁶³Connotea – http://www.connotea.org/
⁶⁴EPrints – http://www.eprints.org/
⁶⁵Lund, B. "Tagging and Bookmarking In Institutional Repositories," http://blogs.nature.com/wp/nascent/2006/03/tagging_and_bookmarking_in_ins.html, 2006.
⁶⁶Chemical Markup Language – www.ch.ic.ac.uk/cml/
⁶⁷Systems Biology Markup Language – http://sbml.org/
⁶⁸Minimum Information About a Microarray Experiment - http://www.mged.org/Workgroups/MIAME/miame.html
⁶⁹KML – http://code.google.com/apis/kml/documentation/
⁷⁰Butler, D. "The spread of avian flu with time; new maps exploiting Google Earth’s time series function," http://declanbutler.info/blog/?p=58, 2007.

URL to article: http://www.ctwatch.org/quarterly/articles/2007/08/web-20-in-science/

CTWatch Quarterly » Web 2.0 in Science

Labels: open access, web 2.0

Re-Engineering Libraries

Friday, December 7, 2007

CTWatch Quarterly » Web 2.0 in Science

0 Comments:

About Me

Previous Posts