Monday, September 28, 2009

How to use comments as a measure for understanding blog posts

Most of proposed researches about the blogosphere focus only on blog posts and ignore their comments in their analysis. Comments can be used as a measure of understanding their corresponding posts.

A lot of applications can benefit from comments-oriented summarization, such as blog search, blog representation, reader feedback and others. Among all blog posts that containing comments, an average of 6.3 comments per post was observed. Nevertheless, very few studies on blog comments and post summarization have been reported.

One of the described models is to extract representative sentences from a blog post that best represent the topics discussed among its comments. First, derive representative words from comments and then selects sentences containing representative words. A lot of word representativeness measures can be used such as binary and comment frequency, term frequency and ReQuT model, where ReQuT concerns about three aspects including (Reader, Quotation and Topic).

Three Observations are used as guidelines for the ReQut model.
• Observation 1: many readers mention other readers' names as a replay to them.
• Observation 2: a comment may contain quoted sentences
• Observation 3: discussions in comments often branches into several topics and a set of comments are linked together by sharing the same topic.

The previous representativeness measures can be defined as the following.
• Binary: if there is at least one comment or not.
• Comment frequency: number of comments containing a certain word (similar to document frequency).
• Term frequency: number of occurrences in all comments of the blog post.

But all of these three measures can suffer from spam comments. Other measures can be used like authors of comments and quotations among comments.

An equation is derived based on the previous measures with corresponding ratios to calculate the word representativeness score. Using human labeled sentences, a two sentence selection methods are used with these measures.

References:

M. Hu, A. Sun and E. Lim. Comments-Oriented Blog Summarization by Sentence Extraction. CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007.

No comments:

Post a Comment