web2express.org

December 5, 2006

Why publishes raw experiment data?

Filed under: news, semantic publishing — aj @ 12:15 am

[::Content::]

Current research publishing/communication model: After doing a series of experiments, researchers usually write up a paper to present the findings, interesting results backed by data. Research papers serve as the main communication vehicle within as well as outside the research community. In addition, researchers present their findings in various conferences. Papers and conference talks are typically published in journals, proceedings, and books in print form and/or in electronic form.

The current scientific publishing model has worked pretty well for centuries. However, it also has obvious problems. This first main problem is lack of free access, i.e. access to the published materials is mostly limited to paid-users. This problem is being addressed by the Open Access movement.

The second main problem with current publishing model is that publishing research paper is not an efficient way to share data. There are several reasons for this:

  1. Because a research paper usually is a complex synthesis of many experimental facts and interpretations of the facts, it is not an easy task for human, never mind computer, to sort out the individual facts.
  2. Most of experiment data are not included in publication and thus remain inaccessible to the research community. For the experiment data used in research paper, typically little detail is given.
  3. Long delay of data availability. Usually, the data already becomes several months to a couple of years old when the research paper comes out.

These inefficiencies really hamper data sharing and information discovery. So, new ways of publishing are needed in order to increase the efficiency of sharing research information. I think direct publishing of experiment data on the web presents a good solution. Imaging for a moment that information about every experiment is available at your finger tips, organized by single experiment unit, would that make your search of prior studies in terms of experiments, data, and results much faster and precise? What if every researcher makes their experiment data available in real time or immediately after the experiments are completed? Would that make your research also go faster? The answers are certainly yes, I believe. This is why publishing experiment data makes absolute sense.

You may as why it was not done before? There could be many reasons that require clear understanding. One main reason could be pure economics. Before Internet age, publishing means printing and traditional distribution, which is very costly business. The volume of the experiment data is so huge that no publisher would consider publishing every piece of experiment data a sound business.

Well, what’s different today? Today is a very different environment from 15 years ago. For one thing, Internet technologies and web economy has made the cost of web publishing and distribution almost disappear. Therefore, publishing all experiment data on the web becomes a viable idea now. The benefits of being able to search through every experiment data on the planet efficiently and the potential to accelerate scientific discovery is so high that, I think, this idea is going to become reality in the near future. In fact, data sharing in life sciences has been a very active research subject in recent years. Some good examples include MGET, GO, FUGO, BIOPAX, etc. These projects are developing ontologies for representing data in their specific scientific domains.

Experiment data can be represented at different levels of granularity. Web2express.org is approaching the data sharing problem from the opposite end of the spectrum comparing to the bioscience projects like MGET. It’s developing shallow ontology to represent data across all scientific fields, such as life sciences, computer, social science, etc. Early version (v0.2) of SPE ontology for self-publishing of experiments is being reviewed within W3C HCLS interest group, and demo publishing tool is available online for testing and download.

[::Subject::]

Open data, semantic publishing, ontology, SPE, semantic web

[::Category::]

Internet software

[::Author::]

AJ Chen

1 Comment »

  1. AJ,
    I absolutely agree that the publication of raw data is critical to truly bring science into the Web2.0 world. My laboratory has been doing this with our research on the synthesis of anti-malarial compounds. I have called this Open Notebook Science to distinguish from other Open Science initiatives that are open but don’t insist on the publication of all raw experimental data. We are currently using blogs and wikis to publish but it would be interesting to look at additional modalities, especially if you have a functional ontology in place for scientific knowledge.
    http://usefulchem.wikispaces.com/
    http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html

    Comment by Jean-Claude Bradley — December 6, 2006 @ 5:06 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

You must be logged in to post a comment.

Powered by WordPress