A huge data stream of blogs has been added to the web2express digest server for a couple of weeks now, but the front end http://web2express.org was finally connected to this important addition of real time data yesterday.
For more than half a year the system has consumed only twitter streaming data, which is about 1000 tweets per minute. New topics are auto-discovered from these twitter conversations every minute as popular topics emerge.
The blogsphere is another rich source of fresh conversations. It’s much bigger than twitter stream in bytes. Thanks to Kevin Burton, CEO of Spinn3r, I am permitted to pull in Spinn3r’s blog data stream to my real time analysis system. We met on the event that I organized for twitter applications 2 months ago. Kevin was so generous to suggest me to try their data service. Spinn3r crawls the whole web for blogs and then make the data available through API. It’s not exactly in real time, but very close. Roughly one million blogs per 24-h period (after filtering) is now flowing from spinn3r to web2xpress.
Combining both twitter conversation data and spinn3r blog data, web2express digest server now analyzes 2-3 million fresh conversations and blogs on a continuous 24-hour basis. The primary analysis is still auto-discovery of new topics in real time conversations. Immediately after new topics are auto-discovered, they are provided to the frond-end web page, allowing users to follow realtime twitter conversations and blogs by topic.
It’s exciting to see such a huge amount of fresh web contents being analyzed in real time! I’m pretty sure this real time system can have many different applications or usages. For example, advertisers and marketers can use it to monitor products and brands on social media, do sentiment analysis, and identify potential targets. I’m interested in hearing feedback from marketers and finding ways to make this real time tool useful for social media marketing.