Code Review Automation: Strengths and Weaknesses of the State of the Art

Staff - Faculty of Informatics

Date: 28 September 2023 / 16:30 - 17:30

USI East Campus, Room D1.15

Speaker: Rosalia Tufano

Abstract: The automation of code review has been tackled by several researchers with the goal of reducing its cost. The adoption of deep learning in software engineering pushed the automation to new boundaries, with techniques imitating developers in generative tasks, such as commenting on a code change as a reviewer would do or addressing a reviewer’s comment by modifying code. The performance of these techniques is usually assessed through quantitative metrics, e.g., the percentage of instances in the test set for which correct predictions are generated, leaving many open questions on the techniques’ capabilities. For example, knowing that an approach is able to correctly address a reviewer’s comment in 10% of cases is of little value without knowing what was asked by the reviewer: What if in all successful cases the code change required to address the comment was just the removal of an empty line? In this study we aim at characterizing the cases in which three code review automation techniques tend to succeed or fail in the two above-described tasks. It has a strong qualitative focus, with ∼105 man-hours of manual inspection invested in manually analyzing correct and wrong predictions generated by the three techniques, for a total of 2,296 inspected predictions. The output of this analysis are two taxonomies reporting, for each of the two tasks, the types of code changes on which the experimented techniques tend to succeed or to fail, pointing to areas for future work.

Biography: Rosalia Tufano is a Ph.D student in the Faculty of Informatics at the Università della Svizzera italiana (USI), Switzerland, and part of the Software Analytics Research Team (SEART). She received her MSc. in Applied Mathematics from Università degli Studi di Napoli Federico II, Italy, in March 2019. Her research interests mainly include the study and the application of machine learning techniques to support code-related tasks. More information available at: