web2express.org

December 5, 2006

Why publishes raw experiment data?

Filed under: news, semantic publishing — aj @ 12:15 am

[::Content::]

Current research publishing/communication model: After doing a series of experiments, researchers usually write up a paper to present the findings, interesting results backed by data. Research papers serve as the main communication vehicle within as well as outside the research community. In addition, researchers present their findings in various conferences. Papers and conference talks are typically published in journals, proceedings, and books in print form and/or in electronic form.

The current scientific publishing model has worked pretty well for centuries. However, it also has obvious problems. This first main problem is lack of free access, i.e. access to the published materials is mostly limited to paid-users. This problem is being addressed by the Open Access movement.

The second main problem with current publishing model is that publishing research paper is not an efficient way to share data. There are several reasons for this:

  1. Because a research paper usually is a complex synthesis of many experimental facts and interpretations of the facts, it is not an easy task for human, never mind computer, to sort out the individual facts.
  2. Most of experiment data are not included in publication and thus remain inaccessible to the research community. For the experiment data used in research paper, typically little detail is given.
  3. Long delay of data availability. Usually, the data already becomes several months to a couple of years old when the research paper comes out.

These inefficiencies really hamper data sharing and information discovery. So, new ways of publishing are needed in order to increase the efficiency of sharing research information. I think direct publishing of experiment data on the web presents a good solution. Imaging for a moment that information about every experiment is available at your finger tips, organized by single experiment unit, would that make your search of prior studies in terms of experiments, data, and results much faster and precise? What if every researcher makes their experiment data available in real time or immediately after the experiments are completed? Would that make your research also go faster? The answers are certainly yes, I believe. This is why publishing experiment data makes absolute sense.

You may as why it was not done before? There could be many reasons that require clear understanding. One main reason could be pure economics. Before Internet age, publishing means printing and traditional distribution, which is very costly business. The volume of the experiment data is so huge that no publisher would consider publishing every piece of experiment data a sound business.

Well, what’s different today? Today is a very different environment from 15 years ago. For one thing, Internet technologies and web economy has made the cost of web publishing and distribution almost disappear. Therefore, publishing all experiment data on the web becomes a viable idea now. The benefits of being able to search through every experiment data on the planet efficiently and the potential to accelerate scientific discovery is so high that, I think, this idea is going to become reality in the near future. In fact, data sharing in life sciences has been a very active research subject in recent years. Some good examples include MGET, GO, FUGO, BIOPAX, etc. These projects are developing ontologies for representing data in their specific scientific domains.

Experiment data can be represented at different levels of granularity. Web2express.org is approaching the data sharing problem from the opposite end of the spectrum comparing to the bioscience projects like MGET. It’s developing shallow ontology to represent data across all scientific fields, such as life sciences, computer, social science, etc. Early version (v0.2) of SPE ontology for self-publishing of experiments is being reviewed within W3C HCLS interest group, and demo publishing tool is available online for testing and download.

[::Subject::]

Open data, semantic publishing, ontology, SPE, semantic web

[::Category::]

Internet software

[::Author::]

AJ Chen

December 4, 2006

Web2x search site demo launched

Filed under: news, semantic search engine — aj @ 11:49 pm

[::Content::]

Web2x is a new platform for publishing and search content on web2 – the second generation of web consisting of web documents and semantic data. I released the demo for the Web2x publishing software last month. And now, the Web2x search engine is online as a demo. You can check how it searches web documents as well as semantic data on the same site.

Go to Web2x search engine demo, and try search term “semantic web”. Since it is demo only, its current content has web pages and semantic data only from web2express.org.

Web2x platform can potentially bring a new leveled playing field to everyone in the R&D community, including researchers as well as companies providing R&D tools. Researchers can use the free web2x publishing software to self-publish research data to the web in HTML and RDF format at the same time. Current software implements the SPE ontology for self-publishing of experiments. Web2x search engine will crawl web sites that are powered by the web2x publishing software and make the web documents as well as semantic data available for search.

[::Subject::]

search engine, semantic search, semantic publishing, semantic web, open data

[::Category::]

internet service

[::Author::]

AJ Chen

[::Contact Person::]

AJ Chen (ajchen AT web2express.org)

Powered by WordPress