web2express.org

June 19, 2009

case study: web2x digest picked up news event early

Filed under: news, web intelligence — aj @ 12:14 am

I notice a perfect case study from today’s news:  “Continental airlines incident” was emerging at about 8:29am (PST) on the twitter daily new topic list on this http://web2express.org website. it was probably at the same time as other major news outlets broke the news, maybe even a litter bit earlier.  Surprisingly, this top news did not show up on Twitter.com’s trending topics list, nor on google trends at all for the whole day.

This case study shows the difference between various trending applications. I think the underlying technology is the key. Calais is the semantic analysis core of my real time trending system. It seems Calais does pretty good job as it promises.

more digging:

I just found the flight schedule from NYT.com report:  The flight, Flight 61, took off at 9:54 a.m. in Brussels (3:54 a.m. Eastern time), according to Continental’s Web site. It touched down at 11:47 a.m., earlier than its scheduled noon landing, at Gate C123 at Newark Liberty International Airport.

Something is amazing if you compare the time carefully. The incident showed up as new hot twitter topics at 8:29am PST while the flight was still in the air.  where did the tweets come from so early? did someone tweet in the airplane? or people on the ground in Europe got the information early and tweet? Anyway, news travels on twitter fast, very fast!

June 17, 2009

semantic technology conference 2009

Filed under: events, news, semantic search engine, web intelligence — aj @ 8:59 pm

I have attended the semantic technology conference again this year – the largest gathering for semantic tech companies. Several things stand out:

Google gave a session on semantics – amazing change of altitude from the past. They support mircoformat and RDFa in documents and use metadata to make search results more relevant.  Great turning point.

Several companies have opened their NLP/semantic analysis core through api service, and they are free at least for developers. This is great news for the developer community because more tools are avaliable for creating new semantic web applications. In addition to OpenCalais and Zemanta, which I have tried before, the new semantic apis come from Ontos, AdaptiveBlue, Dapper, and Expert Systems.

Tom Gruber announced virtual personal assitant from his new company Siri. It works on mobil phone and provides a speech input interace for intelligent search. Cool product!

I moderated a panel discussion on web intelligence. Four companies on the panel (Siri, Zemanta, Expert Systems, Overtone) have different tricks to bring some levels of intelligence to the web or customers.  There were lots of questions from the audience. My opening remarks tried to convey the following messages:

1. Web intelligence is becoming the hallmark of the web as social networking connects people closer. The impact is showing up in many application ares:

  • Smarter Social Networking
    • connect to people you like and sources you trust
    • assisted by consumer intelligent agents
    • Twine, Siri
  • Semantic Search answering your questions
    • Google, Microsoft Bing, Wolfram Alpha
  • Semantic Publishing
    • Reuters, Freebase, Zemanta
  • Web-scale Market Intelligence
    • understand customer needs in real time
    • having intelligent dialog with customers  (branding)
    • Overtone, Attensity, ScoutLabs
  • Semantic Online Advertising
    • targeted by content semantics
    • Google, Peer39, TextWise

2. Plenty of open/free tools and data are available for developers to start creating web apps with intelligence.

Lots of free/open tools available to play with:

  • semantic analysis API: Open Calais, Zemanta
  • NLP: OpenNLP, Stanford NLP tools
  • knowledge management: Protege, Jena
  • search: Lucene, Solr
  • social networking: Orkut,


Lots of data available:

  • Twitter stream
  • News feeds, blog feeds, …
  • Wikipedia, Freebase, DBPedia, …

3. Latest trend:  real time data + intelligence. Semantic Web SIG will focus on twitter stream and applications on the next event on July 1.  TechCrunch will devote a full-day conference for this topic if real time data stream on July 10.

Overrall, the semantic technolgy is getting more steam now and we’ll see what killer apps will emerge in the next year or so.

aj chen

May 22, 2009

web2express digest update

Filed under: news, web intelligence — aj @ 12:38 am

In the past several days, web2express digest site did not receive data from twitter api. This was because twitter was still trying to fix the problem with its timeline feed for data mining. Earlier today, twitter api team announced the retirement of the data mining timeline feed. But, good news is that twiiter has something better for developers: the streaming api.

I implemented streaming api for web2express digest web site yesterday. It has been running smoothly. The latest version of Twitter4j library released a couple of days supports streaming api. I just plug it in and it works beautifully.

New feature: a new feature is also added to allow visitors to add twitter user as friend while browsing the hot topics on the digest web site.  I think this feature will make the free web site more useful to twitter users.

-aj

May 15, 2009

twitter hot topics reflecting life of ordinary people

Filed under: web intelligence — aj @ 1:53 am

I got a comment earlier today on twitter digest topics: Though it hurts me in the opinion-of-humanity part of my brain to learn how heavily represented American Idol is on that list.

Yes, the daily hot topics may surprise many people. I could not believe what I saw when the system went online for the first time a few months ago. If you are used to reading tech news or WSJ, you may get a shock. The daily conversations on twitter, and probably other social networking sites, are mostly about TV shows, movies, games, and other entertainment stuff. But, on the other hand, this also makes sense. People are talking about their lives on social networking sites, and life is not all about technology and stock market, at least for most ordinary people.

Web2express Digest does not cut or selection of topics. It just shows whatever comes out of the ongoing conversations from millions of people. I think we can learn a lot from this information in addition to becoming more effective in navigating through the twitter sphere.
-aj

May 13, 2009

Announcing Web2express Digest for Twitter

Filed under: news, web intelligence — aj @ 9:54 pm

After several months of development, I’m happy to announce twitter digest web site is now available for the public. Web2express.org web site’s default home page is switched to this “Digest” web app.

Thanks to twitter’s api and the api team, the data feed for data mining is just wonderful. Web2express Digest is a real time system that takes in the feeds as they become available, and does some NLP analysis on the tweets using open tools like Open Calais and openNLP.  The results are freely available on http://web2express.org/.  Using this twiiter web app, you can spot   daily hot topics and for each hot topic, quickly find the top contributing twitter users. I hope this real time information will
help users to understand the popular topics at any given moment and easily identify who to follow.

I have run the system internally since the beginning of this year. Tweets from millions of users have been analyzed. Each day, large amounts of public tweets pass through the system, from which hot topics are identified by semantic analysis and ranked by popularity. By comparing hot topics in current 24-hour period to the previous 24-hours, new topics are selected.

Please let me know if you have any comment.

AJ Chen

« Newer PostsOlder Posts »

Powered by WordPress