Monday, October 5, 2009

Algorithm for ranking blog posts

Algorithms for ranking blog posts

First why do original ranking algorithms for web pages are not suitable for blog posts?

· Posts have their unique features different from those of web-pages which are like trackbacks, scraps, and comments.

· Posts are not rich in hyperlinks so normal ranking algorithms would be not enough for good ranking.

A Trackback is an action of writing a new post related to someone else’s post and putting a link to the original post within the new post

A scrap is an action of copying someone else’s post to one’s own blog

A comment is an action of putting a short opinion on someone else’s post.

Trackbacks and scraps are considered more important than comments

So how can this problem be solved? By making some modification on the original algorithms for web page ranking to be suitable for blog posts

Web page ranking algorithm is like InDegree, PageRank, and HITS

First we have to define a concept named as the web graph: each node for this graph is a web page and there is an edge from a node to another if the first page has a hyperlink referencing the second page.

First Algorithm and simplest is InDegree which gives the importance of a page by the number of pages referencing it.

PageRank, which is an algorithm used by Google , interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important".

HITS (Authorities and Hubs) : An authority value is computed as the sum of the scaled hub values that point to that page. A hub value is the sum of the scaled authority values of the pages it points to

The update in blog posts ranking is that instead of the web graph it makes a bipartite graph which consists of two different types and called a post-blogger graph like the figure




Algorithms for ranking posts

InDegree is applied without modifications

AuthHub :

AuthHub is a new version of HITS modified for post ranking in blog environment. We note that both an authority score and a hub score are given to a web-page in original HITS. In AuthHub, however, an authority score is only given to a post, and a hub score is only given to a blogger. Thus, the ranks of posts are decided only by authority scores. An authority score of a post is a sum of hub scores of all the bloggers who do trackbacks or scraps on the post. Also, a hub score of a blogger is a sum of authority scores of the posts which the blogger do trackbacks or scraps on.

Conclusion of evaluation of modifications

After evaluation on a big dataset with taking opinions of users AuthHub performs better than original ranking algorithm and other modification to the ranking algorithms

Reference

[1] Post Ranking Algorithms in Blog Environment, Second International Conference on Future Generation Communication and Networking Symposia, 2008


No comments:

Post a Comment