Proximity-based Approaches to Blog Opinion Retrieval

Decanato - Facoltà di scienze informatiche

Data d'inizio: 13 Settembre 2012

Data di fine: 14 Settembre 2012

You are cordially invited to attend the PhD Dissertation Defense of Shima GERANI on Thursday, September 13th 2012 at 09h30 in room A34 (Red building)

Recent years have seen the rapid growth of social media platforms that enable people to express their thoughts and perceptions on the web and share them with other users. Many people write their opinion about products, movies, people or events on blogs, forums or review sites. The so-called User Generated Content is a good source of users’ opinion and mining it can be very useful for a wide variety of applications that require understudying of public opinion about a concept.

Blogs are one of the most popular and influential social media. The rapid growth in the popularity of blogs, the ability of bloggers to write about different topics and the possibility of getting feedback from other users, makes the blogosphere a valuable source of opinions on different topics. To facilitate access to such opinionated content new retrieval models called opinion retrieval models are necessary. Opinion retrieval models aim at finding documents that are relevant to the topic of a query and express opinion about it.

However, opinion retrieval in blogs is challenging due to a number of reasons. The first reason is that blogs are not limited to a single topic, they can be about anything that is of interest to an author. Therefore, a large number of blog posts may not be relevant to the topic of query. The second reason is that a blog post relevant to a query, can be also relevant to a number of other topics and express opinion about one of the non-query topics. Therefore, an opinion retrieval system should first locate the document relevant to a query and then score documents based on the opinion that is targeted at the query in a relevant document. Finally, blogs are not limited to a single domain, an opinion retrieval model should be general enough to be able to retrieve posts related to different topics in different domains.

In this thesis, we focus on the opinion retrieval task in blogs. Our aim is to propose methods that improve blog post opinion retrieval performance. To this end, we consider an opinion retrieval model to consist of three components: relevance scoring, opinion scoring and the score combination components. In this thesis we focus on the opinion scoring and combination components and propose methods for better handling these two important steps. We evaluate our propose methods on the standard TREC collection and provide evidence that the proposed methods are indeed helpful and improve the performance of the state of the art techniques.

Dissertation Committee:

  • Prof. Fabio Crestani, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Fernando Pedone, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Marc Langheinrich, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Mohand Boughanem, Paul Sabatier University Toulouse, France (External Member)
  • Prof. Marteen De Riike, University of Amsterdam, The Netherlands (External Member)