Methods for Ranking User-Generated Streams - A Case Study in Blog Feed Retrieval

Decanato - Facoltà di scienze informatiche

Data d'inizio: 14 Settembre 2012

Data di fine: 15 Settembre 2012

You are cordially invited to attend the PhD Dissertation Defense of Mostafa KEIKHA on Friday, September 14th 2012 at 09h00 in room A33 (Red building)

User generated content are one of the main sources of information on the Web nowadays. With the huge amount of this type of data being generated every day, having an efficient and effective retrieval system is essential.  The goal of such a retrieval system is to enable users to search through this data and retrieve documents relevant to their information needs.

Among the different retrieval tasks of user generated content, retrieving and ranking streams is one of the important ones that has various applications.  The goal of this task is to rank streams, as collections of documents with chronological order, in response to a user query.  This is different than traditional retrieval tasks where the goal is to rank single documents and temporal properties are less important in the ranking.

In this thesis we investigate the problem of ranking user-generated streams with a case study in blog feed retrieval. Blogs, like all other user generated streams, have specific properties and require new considerations in the retrieval methods.  Blog feed retrieval can be defined as retrieving blogs with a recurrent interest in the topic of the given query. We define three different properties of blog feed retrieval each of which introduces new challenges in the ranking task. These properties include: 1) term mismatch in blog retrieval, 2) evolution of topics in blogs and 3) diversity of blog posts.  For each of these properties, we investigate its corresponding challenges and propose solutions to overcome those challenges. We further analyze the effect of our solutions on the performance of  a retrieval system. We show that taking the new properties into account for developing the retrieval system can help us to improve state of the art retrieval methods.  In all the proposed methods, we specifically pay attention to temporal properties that we believe are important information in any type of streams. We show that when combined with content-based information, temporal information can be useful in different situations.

Although we apply our methods to blog feed retrieval, they are mostly general methods that are applicable to similar stream ranking problems like ranking experts or ranking twitter users.

Dissertation Committee:

  • Prof. Fabio Crestani, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Antonio Carzaniga, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Kai Hormann, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. ChengXiang Zhai, University of Illinois at Urbana-Champaign, USA (External Member)
  • Prof. Ricardo Baeza-Yates, Yahoo Research Barcelona, Spain (External Member)
  • Prof. Fabrizio Silvestri, CNR Pisa, Italy (External Member)