Effective Diameter Estimation for Very Large Graphs

Decanato - Facoltà di scienze informatiche

Data: 22 Giugno 2018 / 11:30 - 12:00

USI Lugano Campus, room SI-013, Informatics building (Via G. Buffi 13)

Speaker: Gianni Amati
  Fondazione Ugo Bordoni, Italy
Date: Friday, June 22, 2018
Place: USI Lugano Campus, room SI-013, Informatics building (Via G. Buffi 13
Time: 14:30-15:30

 

Abstract:

We show how to efficiently estimate the effective diameter and other distance metrics on very large graphs. We exploit the MinHashing approach, that was first introduced in Information Retrieval to solve the problem of near duplicate detection, to derive compressed representations of large and sparse datasets that preserve similarity, to efficiently provide a good approximation of the size of the neighborhood of a node. We also compare the MinHashing approach with the state of art methods with particular attention to scalability issues. At this aim we indeed introduce a easy and simple distributed version of the MinHashing approach, based on Apache's Spark framework.

 

Biography:

Gianni Amati leads the Laboratory on Big Data at the Fondazione Ugo Bordoni in Rome, Italy. He is also Adjunct Professor of "Information Retrieval" for the advanced Course on Information Science at University of Roma Due, Tor Vergata. Gianni was the initial developer of Terrier, a high performance and scalable open-source search engine, and is involved in research and industry projects on search, sentimental analysis, massive clustering and visualization of Social Networks. He has been involved in the organization and management of the main annual conferences (e.g., SIGIR, CIKM, ECIR, ICTIR).

 

Host: Prof. Fabio Crestani