Saturday, October 10, 2009

Applying Social Network Analysis on Blogospheres

This post summarizes SUN Wen-jun and QIU Hang-ming: "A Social Network Analysis on Blogospheres", in the International Conference on Management Science & Engineering (15th), Long Beach, USA, September, 2008.

Blogging has become a way that makes people socially interact together. you can follow somebody's else stories, comment on them, and refer to them sometimes without any distance restrictions between you and him/her or any techniqual knowledge of using the internet. That's why the blogosphere is considered a manifestation of a social network and reflects a kind of social relationship between users. A social network is defined as a collection of nodes (social actors) and links between the nodes (the interconnections) and the social network analysis (SNA) only cares about the relationships between the actors without any concern with the actors themselves.

Graph theory is one of the foundations of SNA and the formulation of social networks consists of a social graphs and a social relation matrices.
Common concepts of social network analysis:

1- Degree

The degree of a node is the number of links to this nodes. In a directed graph, there is an in degree and an out degree. The out degree of node i is the number of links pointing out of this node, and the in degree is the number of links pointing to this node. If both the in-degree and out-degree are zero, ten the node is called an isolated node.

2- Geodesic path
The shortest path between two nodes.

3- Geodesic distance
The distance of the geodesic path d(i, j) .

4- Diameter
The distance of the longest geodesic path. D= Max {d(i, j)}

5- Density
A measure of the closeness of a network. The more links between a particular number of nodes, the larger the density of the network.


ρ= 2l / n *(n − 1) for directed graph

ρ= l / n *(n − 1) for undirected graph

where ρ is the density, l is the number of links, and n is the number of the nodes.

6- Power and centrality
Social scientists measure power from the prespective of "relationship".
The centrality is used to express the concept of power. Centrality tells what central role a node plays in a social network.

Links categories
In a Blogosphere, the individual blogs are represented as nodes in the graph and the inter-links between them are directional links in the graph.
There are four categories of links:
1- The link put by the BSP (Blog Service Provider); these links are present on all blogs under the same blog site.
2- The citation of other blogs intentionally put in the page by the owner of the blog.
3- Links to other blogs in a special fixed location on a blog's page layout.
4- Links left by visitor of a blog in reviews and comments.

Monday, October 5, 2009

Algorithm for ranking blog posts

Algorithms for ranking blog posts

First why do original ranking algorithms for web pages are not suitable for blog posts?

· Posts have their unique features different from those of web-pages which are like trackbacks, scraps, and comments.

· Posts are not rich in hyperlinks so normal ranking algorithms would be not enough for good ranking.

A Trackback is an action of writing a new post related to someone else’s post and putting a link to the original post within the new post

A scrap is an action of copying someone else’s post to one’s own blog

A comment is an action of putting a short opinion on someone else’s post.

Trackbacks and scraps are considered more important than comments

So how can this problem be solved? By making some modification on the original algorithms for web page ranking to be suitable for blog posts

Web page ranking algorithm is like InDegree, PageRank, and HITS

First we have to define a concept named as the web graph: each node for this graph is a web page and there is an edge from a node to another if the first page has a hyperlink referencing the second page.

First Algorithm and simplest is InDegree which gives the importance of a page by the number of pages referencing it.

PageRank, which is an algorithm used by Google , interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important".

HITS (Authorities and Hubs) : An authority value is computed as the sum of the scaled hub values that point to that page. A hub value is the sum of the scaled authority values of the pages it points to

The update in blog posts ranking is that instead of the web graph it makes a bipartite graph which consists of two different types and called a post-blogger graph like the figure




Algorithms for ranking posts

InDegree is applied without modifications

AuthHub :

AuthHub is a new version of HITS modified for post ranking in blog environment. We note that both an authority score and a hub score are given to a web-page in original HITS. In AuthHub, however, an authority score is only given to a post, and a hub score is only given to a blogger. Thus, the ranks of posts are decided only by authority scores. An authority score of a post is a sum of hub scores of all the bloggers who do trackbacks or scraps on the post. Also, a hub score of a blogger is a sum of authority scores of the posts which the blogger do trackbacks or scraps on.

Conclusion of evaluation of modifications

After evaluation on a big dataset with taking opinions of users AuthHub performs better than original ranking algorithm and other modification to the ranking algorithms

Reference

[1] Post Ranking Algorithms in Blog Environment, Second International Conference on Future Generation Communication and Networking Symposia, 2008