Learning Structured Neural Representations for Visual Reasoning Tasks

Decanato - Facoltà di scienze informatiche

Data: / -

You are cordially invited to attend the PhD Dissertation Defense of Simon van Steenkiste on Wednesday November 4th, 2020 at 17:00
Please note that given the updated Covid-19 restrictions, the Dissertation Defense will be held online.

You can join here

Deep neural networks learn representations of data to facilitate problem-solving in their respective domains. However, they struggle to acquire a structured representation based on more symbolic entities, which are commonly understood as core abstractions central to human capacity for generalization. This dissertation studies this issue for visual reasoning tasks. Inspired by how humans solve these tasks, we propose to learn structured neural representations that distinguish objects: abstract visual building blocks that can separately be composed and reasoned with. We investigate the limitations of current deep neural networks at effectively discovering, representing, and relating these more symbolic entities, and present several improvements. To address the problem of discovering and representing objects, we propose two novel approaches. In one case, we formalize this problem as a pixel-level clustering problem and formulate a neural differentiable clustering algorithm that solves it. We demonstrate how, unlike standard representation learning techniques, it can be trained to learn about objects in an unsupervised manner and acquire corresponding representations that can be treated as symbols for reasoning. In the other case, we adopt a purely generative approach and demonstrate how a neural network equipped with the right inductive bias can learn about objects in the process of synthesizing images, even in complex visual settings. Concerning the problem of relating symbolic entities with neural networks, we investigate how object representations can help facilitate building structured models for common-sense physical reasoning that generalize more systematically. We extend our previous representation learning approach to facilitate model building in this way and demonstrate how it can learn about general relations between objects to reason about their (future) physical interactions. Finally, we investigate the utility of a representational format that isolates independent sources of information for encoding the features of individual objects. We conduct a large-scale study of such 'disentangled' representations that includes various methods and metrics on two new abstract visual reasoning tasks. Our results indicate that better disentanglement enables quicker learning using fewer samples.

Dissertation Committee:
- Prof. Jürgen Schmidhuber, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Cesare Alippi, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Natasha Sharygina, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Leslie Kaelbling, MIT, USA (External Member)
- Prof. Michael Mozer, Google Brain & University of Colorado, Boulder, USA (External Member)
- Prof. Bernhard Schölkopf, MPI, Germany (External Member)