web2express.org

June 19, 2009

case study: web2x digest picked up news event early

Filed under: news, web intelligence — aj @ 12:14 am

I notice a perfect case study from today’s news:  “Continental airlines incident” was emerging at about 8:29am (PST) on the twitter daily new topic list on this http://web2express.org website. it was probably at the same time as other major news outlets broke the news, maybe even a litter bit earlier.  Surprisingly, this top news did not show up on Twitter.com’s trending topics list, nor on google trends at all for the whole day.

This case study shows the difference between various trending applications. I think the underlying technology is the key. Calais is the semantic analysis core of my real time trending system. It seems Calais does pretty good job as it promises.

more digging:

I just found the flight schedule from NYT.com report:  The flight, Flight 61, took off at 9:54 a.m. in Brussels (3:54 a.m. Eastern time), according to Continental’s Web site. It touched down at 11:47 a.m., earlier than its scheduled noon landing, at Gate C123 at Newark Liberty International Airport.

Something is amazing if you compare the time carefully. The incident showed up as new hot twitter topics at 8:29am PST while the flight was still in the air.  where did the tweets come from so early? did someone tweet in the airplane? or people on the ground in Europe got the information early and tweet? Anyway, news travels on twitter fast, very fast!

June 17, 2009

semantic technology conference 2009

Filed under: events, news, semantic search engine, web intelligence — aj @ 8:59 pm

I have attended the semantic technology conference again this year – the largest gathering for semantic tech companies. Several things stand out:

Google gave a session on semantics – amazing change of altitude from the past. They support mircoformat and RDFa in documents and use metadata to make search results more relevant.  Great turning point.

Several companies have opened their NLP/semantic analysis core through api service, and they are free at least for developers. This is great news for the developer community because more tools are avaliable for creating new semantic web applications. In addition to OpenCalais and Zemanta, which I have tried before, the new semantic apis come from Ontos, AdaptiveBlue, Dapper, and Expert Systems.

Tom Gruber announced virtual personal assitant from his new company Siri. It works on mobil phone and provides a speech input interace for intelligent search. Cool product!

I moderated a panel discussion on web intelligence. Four companies on the panel (Siri, Zemanta, Expert Systems, Overtone) have different tricks to bring some levels of intelligence to the web or customers.  There were lots of questions from the audience. My opening remarks tried to convey the following messages:

1. Web intelligence is becoming the hallmark of the web as social networking connects people closer. The impact is showing up in many application ares:

  • Smarter Social Networking
    • connect to people you like and sources you trust
    • assisted by consumer intelligent agents
    • Twine, Siri
  • Semantic Search answering your questions
    • Google, Microsoft Bing, Wolfram Alpha
  • Semantic Publishing
    • Reuters, Freebase, Zemanta
  • Web-scale Market Intelligence
    • understand customer needs in real time
    • having intelligent dialog with customers  (branding)
    • Overtone, Attensity, ScoutLabs
  • Semantic Online Advertising
    • targeted by content semantics
    • Google, Peer39, TextWise

2. Plenty of open/free tools and data are available for developers to start creating web apps with intelligence.

Lots of free/open tools available to play with:

  • semantic analysis API: Open Calais, Zemanta
  • NLP: OpenNLP, Stanford NLP tools
  • knowledge management: Protege, Jena
  • search: Lucene, Solr
  • social networking: Orkut,


Lots of data available:

  • Twitter stream
  • News feeds, blog feeds, …
  • Wikipedia, Freebase, DBPedia, …

3. Latest trend:  real time data + intelligence. Semantic Web SIG will focus on twitter stream and applications on the next event on July 1.  TechCrunch will devote a full-day conference for this topic if real time data stream on July 10.

Overrall, the semantic technolgy is getting more steam now and we’ll see what killer apps will emerge in the next year or so.

aj chen

May 22, 2009

web2express digest update

Filed under: news, web intelligence — aj @ 12:38 am

In the past several days, web2express digest site did not receive data from twitter api. This was because twitter was still trying to fix the problem with its timeline feed for data mining. Earlier today, twitter api team announced the retirement of the data mining timeline feed. But, good news is that twiiter has something better for developers: the streaming api.

I implemented streaming api for web2express digest web site yesterday. It has been running smoothly. The latest version of Twitter4j library released a couple of days supports streaming api. I just plug it in and it works beautifully.

New feature: a new feature is also added to allow visitors to add twitter user as friend while browsing the hot topics on the digest web site.  I think this feature will make the free web site more useful to twitter users.

-aj

May 13, 2009

Announcing Web2express Digest for Twitter

Filed under: news, web intelligence — aj @ 9:54 pm

After several months of development, I’m happy to announce twitter digest web site is now available for the public. Web2express.org web site’s default home page is switched to this “Digest” web app.

Thanks to twitter’s api and the api team, the data feed for data mining is just wonderful. Web2express Digest is a real time system that takes in the feeds as they become available, and does some NLP analysis on the tweets using open tools like Open Calais and openNLP.  The results are freely available on http://web2express.org/.  Using this twiiter web app, you can spot   daily hot topics and for each hot topic, quickly find the top contributing twitter users. I hope this real time information will
help users to understand the popular topics at any given moment and easily identify who to follow.

I have run the system internally since the beginning of this year. Tweets from millions of users have been analyzed. Each day, large amounts of public tweets pass through the system, from which hot topics are identified by semantic analysis and ranked by popularity. By comparing hot topics in current 24-hour period to the previous 24-hours, new topics are selected.

Please let me know if you have any comment.

AJ Chen

January 9, 2008

NLP finding mass audience

Filed under: events, news — aj @ 3:08 am

I just came back from our semantic web SIG event- another exciting session on cool technology and potential killer application. This time, it’s the “old” NLP, natural language processing. Barney Pell, CEO of Powerset gave an overview on how NLP is solving the chicken and egg problem facing the semantic web. Powerset’s CSO Ron Kaplan then showed what their not-yet released search engine can do. I’m very impressed by their ambitious plan to build deep semantic index of public web pages. What’s more exciting to me, according to Barney, is that they may make their semantic data available to developers. That will enable developers to explore the power of semantic web.

In addition, Rion Snow from Stanford University presented his Ph.D. work on expanding WordNet by NLP trick.

It seems to me search engine may be the vehicle that can deliver mass audience to NLP technology

-aj

« Newer PostsOlder Posts »

Powered by WordPress