Effective Unsupervised Information Retrieval Models based on Information Theory

Staff - Faculty of Informatics

Start date: 6 May 2013

End date: 7 May 2013

The Faculty of Informatics is pleased to announce a seminar given by Gianni Amati

DATE: Monday, May 6th 2013
PLACE: USI Lugano Campus, room 253, Main building (Via G. Buffi 13)
TIME: 15.30

Vector Space Model and Ponte and Croft's language model are two examples of unsupervised IR models. However, the most used IR models, such as language model based on  Dirichlet's smoothing (LM), Divergence From Randomness (DFR) models or even the BM25 formula, are parametric. Supervised IR models are preferred to unsupervised ones due to effectiveness and efficiency reasons. We show that the existence of parameters for these IR models is necessary to cope with variance of document length in the collection and query length when assuming the term independence assumption.
The challenge in IR modeling is thus to provide a simple, yet effective, parameter-free model that perform statistically significantly better than or not statistically differently from LM, DFR or BM25. We introduce a new class of models based on information theory that performs either not statistically differently or better than standard IR models over many test collections, such as all the ad hoc TREC collections (Terabyte, Microblog, Wt10G, Tipsters ecc.)

Giambattista Amati graduated "cum laude" in Mathematics at the University of Rome "La Sapienza" in 1983 and in 2003 obtained a PhD in Computing Science at the University of Glasgow. In 2002 he designed the Divergence From Randomness (DFR) model for Information Retrieval, founding the open source search engine Terrier. Terrier enables the rapid development of effective systems and scalable applications of Information Retrieval.
His current interests include sentiment analysis, web, enterprise, microblog, blog, and vertical search, information extraction.
Among the most recent activities he has served as a General Chair at the BCS International Conference on Theory of Information Retrieval (ICTIR 2011), a Poster Chair at the ACM Conference on Information and Knowledge Management (CIKM 2011), a Workshop Chair at the ACM International Conference on Information Retrieval (SIGIR 2010), a General Chair at the European Conference of Information Retrieval (ECIR 2007), a Poster Chair at the ACM International Conference on Information Retrieval (SIGIR 2007).
He was an Editor of the Information Retrieval Models for the Encyclopedia of Database Systems at Springer. Since 2006 he is an Adjunct Professor of the "Information Retrieval" course at the University of Rome "Tor Vergata".

HOST: Prof. Fabio Crestani