"Big" Data Preparation for the Data Science Journey

Decanato - Facoltà di scienze informatiche

Data: 21 Settembre 2017 / 10:30 - 11:30

USI Lugano Campus, room SI-004, Informatics building (Via G. Buffi 13)

Speaker: Giuseppe Polese
  University of Salerno, Italy
Date: Thursday, September 21, 2017
Place: USI Lugano Campus, room SI-004, Informatics building (Via G. Buffi 13)
Time: 10:30-11:30

 

Abstract:

With the advent of social networks and other modern applications, the term “Big Data” and the many related challenges is becoming a main concern for many organizations and solution providers, given the tremendous growth in the Volume of data (4300% estimated from 2009 to 2020), together with their Variety, and generation Velocity. In this scenario, it is vital to devise suitable technologies to effectively pre-process data from multiple sources, in order to let data scientists and business analysts  derive valuable insights from them and yield value for the goals of their organization (enterprise, public organization, or others). This advocates for technologies capable of turning IT-oriented data processing operations (Data Integration) into User-oriented ones (Data Preparation). 

I will introduce "Big Data" main issues in Data-Driven enterprises and organizations, and will discuss the evolution from Data Integration to Data Preparation technologies. This will let me introduce some research contributions to simplify several Data preprocessing activities, in an attempt to turn them into User oriented ones. I will first describe a conceptual level data integration methodology and tool, and then will discuss the potential contribution of Relaxed Functional Dependencies and the corresponding discovery algorithms from Data to relevant pre-processing activities, such as  data cleansing, and query refactoring upon source schema evolutions.  

 

Biography:

Giuseppe Polese is Professor of Computer Science at the University of Salerno, Italy, and Director of the Data Science and Technologies Laboratory.  His research interests concern the areas of Data Science, Multimedia Databases and Web Engineering, with several interdisciplinary contributions to medical informatics and construction engineering. He has published about 100 papers, some of which in top scientific journals. He is an ACM and IEEE member, and member of the editorial boards of several international scientific journals like Information Systems (Elsevier), Data and Information Quality (ACM), International Journal of Software and Knowledge Engineering (World Scientific), area editor for Database and Decision Support Systems, and the Journal of Data Science and Engineering (Springer). He has received several teaching and research contracts from the Computer Science Department at University of Pittsburgh (USA), during a 3 years visiting period, and has directed several publicly funded research projects, and projects with industry grants. Previously, he was a project manager and knowledge engineer at the Italian Airspace Company, Alenia, and consultant for several software firms, including Siemens, Olivetti, Ericson, and Telecom Italia.

 

Host: Prof. Rolf Krause