Loading Events

Hosted by Remco Duits

Speaker
Ivan Sosnovik, Amazon Generative AI Innovation Center

Title
Scale Symmetry in Computer Vision and Beyond

 

Abstract

While training a model for image analysis, object transformations are usually seen as disturbing factors; we need to provide a larger training dataset if we expect more variations. Translation, rotation, and scaling are the three main transformations that naturally arise from the changing position of the camera while keeping the object itself unchanged. Convolutional Neural Networks are translation equivariant: a translation in the input image leads to a proportional translation in the intermediate representation. They are equipped with the notion of translation by-design and can better utilize the parameters to learn useful features. As a result, they have dominated the field of computer vision.

Group Equivariant CNNs extend this property to other groups. In our research, we focus on scale transformation, which garnered significant interest in the pre-deep-learning era and remained relatively untouched in the realm of neural networks. We develop the theory of scale-equivariant CNNs, demonstrate how to implement it efficiently with modern deep-learning frameworks, and show how it pays off in terms of model performance. Finally, we discuss how the challenges in scale-equivariant models connect us to other fields such as language modeling and drug discovery.

Go to Top