Posts by Collection

portfolio

publications

Benchmarking robustness in object detection: Autonomous driving when winter is coming.

Published in The Machine Learning for Autonomous Driving Workshop, NeurIPS, 2019

The ability to detect objects regardless of image distortions or weather conditions is crucial for real-world applications of deep learning like autonomous driving. We here provide an easy-to-use benchmark to assess how object detection models perform when image quality degrades. The three resulting benchmark datasets, termed Pascal-C, Coco-C and Cityscapes-C, contain a large variety of image corruptions. We show that a range of standard object detection models suffer a severe performance loss on corrupted images (down to 30–60% of the original performance). However, a simple data augmentation trick—stylizing the training images—leads to a substantial increase in robustness across corruption type, severity and dataset. We envision our comprehensive benchmark to track future progress towards building robust object detection models. Benchmark, code and data are publicly available.

A simple way to make neural networks robust against diverse image corruptions

Published in ECCV 2020: 16th European Conference, Glasgow, UK, 2020, Proceedings, Part III 16, 2020

The human visual system is remarkably robust against a wide range of naturally occurring variations and corruptions like rain or snow. In contrast, the performance of modern image recognition models strongly degrades when evaluated on previously unseen corruptions. Here, we demonstrate that a simple but properly tuned training with additive Gaussian and Speckle noise generalizes surprisingly well to unseen corruptions, easily reaching the previous state of the art on the corruption benchmark ImageNet-C (with ResNet50) and on MNIST-C. We build on top of these strong baseline results and show that an adversarial training of the recognition model against uncorrelated worst-case noise distributions leads to an additional increase in performance. This regularization can be combined with previously proposed defense methods for further improvement.

Improving robustness against common corruptions by covariate shift adaptation

Published in Advances in neural information processing systems, 2020

Today’s state-of-the-art machine vision models are vulnerable to image corruptions like blurring or compression artefacts, limiting their performance in many real-world applications. We here argue that popular benchmarks to measure model robustness against common corruptions (like ImageNet-C) underestimate model robustness in many (but not all) application scenarios. The key insight is that in many scenarios, multiple unlabeled examples of the corruptions are available and can be used for unsupervised online adaptation. Replacing the activation statistics estimated by batch normalization on the training set with the statistics of the corrupted images consistently improves the robustness across 25 different popular computer vision models. Using the corrected statistics, ResNet-50 reaches 62.2% mCE on ImageNet-C compared to 76.7% without adaptation. With the more robust DeepAugment+AugMix model, we improve the state of the art achieved by a ResNet50 model up to date from 53.6% mCE to 45.4% mCE. Even adapting to a single sample improves robustness for the ResNet-50 and AugMix models, and 32 samples are sufficient to improve the current state of the art for a ResNet-50 architecture. We argue that results with adapted statistics should be included whenever reporting scores in corruption benchmarks and other out-of-distribution generalization settings.

If your data distribution shifts, use self-learning

Published in TMLR, 2022

In this paper, we demonstrate that self-learning techniques like entropy minimization or pseudo-labeling are simple, yet effective techniques for increasing test performance under domain shifts. Our results show that self-learning consistently increases performance under distribution shifts, irrespective of the model architecture, the pre-training technique or the type of distribution shift. At the same time, self-learning is simple to use in practice because it does not require knowledge or access to the original training data or scheme, is robust to hyperparameter choices, is straight-forward to implement and requires only a few training epochs. This makes self-learning techniques highly attractive for any practitioner who applies machine learning algorithms in the real world. We present state-of-the art adaptation results on CIFAR10-C (8.5% error), ImageNet-C (22.0% mCE), ImageNet-R (17.4% error) and ImageNet-A (14.8% error), theoretically study the dynamics of self-supervised adaptation methods and propose a new classification dataset (ImageNet-D) which is challenging even with adaptation.

Content suppresses style: dimensionality collapse in contrastive learning

Published in NeurIPS 2022 Workshop: Self-Supervised Learning-Theory and Practice, 2022

Contrastive learning is a highly successful yet simple self-supervised learning technique that minimizes the representational distance of similar (positive) while maximizing it for dissimilar (negative) samples. Despite its success, our theoretical understanding of contrastive learning is still incomplete. Most importantly, it is unclear why the inferred representation faces a dimensionality collapse after SimCLR training and why downstream performance improves by removing the feature encoder’s last layers (projector). We show that collapse might be induced by an inductive bias of the InfoNCE loss for features that vary little within a positive pair (content) while suppressing more strongly-varying features (style). When at least one content variable is present, we prove that a low-rank projector reduces downstream task performance while simultaneously minimizing the InfoNCE objective. This result elucidates a potential reason why removing the projector could lead to better downstream performance. Subsequently, we propose a simple strategy leveraging adaptive temperature factors in the loss to equalize content and style latents, mitigating dimensionality collapse. Finally, we validate our theoretical findings on controlled synthetic data and natural images.

Robust deep learning object recognition models rely on low frequency information in natural images

Published in PLOS Computational Biology, 2023

Machine learning models have difficulty generalizing to data outside of the distribution they were trained on. In particular, vision models are usually vulnerable to adversarial attacks or common corruptions, to which the human visual system is robust. Recent studies have found that regularizing machine learning models to favor brain-like representations can improve model robustness, but it is unclear why. We hypothesize that the increased model robustness is partly due to the low spatial frequency preference inherited from the neural representation. We tested this simple hypothesis with several frequency-oriented analyses, including the design and use of hybrid images to probe model frequency sensitivity directly. We also examined many other publicly available robust models that were trained on adversarial images or with data augmentation, and found that all these robust models showed a greater preference to low spatial frequency information. We show that preprocessing by blurring can serve as a defense mechanism against both adversarial attacks and common corruptions, further confirming our hypothesis and demonstrating the utility of low spatial frequency information in robust object recognition.

Removing High Frequency Information Improves DNN Behavioral Alignment

Published in ICLR 2024 Workshop Re-Align, 2024

Despite their increasingly impressive performance and capabilities, to date there still exists a significant misalignment between Deep Neural Networks (DNNs) and human behavior. A large body of research exists identifying misalignments and exploring where they arise from, with some work attributing it to the fact that humans and DNNs use the frequency spectrum of images differently. In this paper, we show that removing high-frequency information by applying blur and resize transformations to images before being fed to a DNN dramatically improves its alignment with humans according to shape-bias and error-consistency. Specifically, a ViT-H-14 OpenCLIP model tested on blurred images achieves an error-consistency with humans of 0.37, halving the current gap between DNN-human and human-human error-consistency. While these operations do affect a model’s accuracy, we present preliminary evidence for an alignment-accuracy tradeoff, and note that moving forward, practitioners may have to choose between having a model with superhuman accuracy and one that behaves like a human.

Effective pruning of web-scale datasets based on complexity of concept clusters

Published in ICLR, 2024

Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. Today’s most effective pruning method on ImageNet clusters data samples into separate concepts according to their embedding and prunes away the most prototypical samples. We scale this approach to LAION and improve it by noting that the pruning rate should be concept-specific and adapted to the complexity of the concept. Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training. By filtering from the LAION dataset, we find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. More specifically, we are able to outperform the LAION-trained OpenCLIP-ViT-B/32 model on ImageNet zero-shot accuracy by 1.1p.p. while only using 27.7% of the data and training compute. Despite a strong reduction in training cost, we also see improvements on ImageNet dist. shifts, retrieval tasks and VTAB. On the DataComp Medium benchmark, we achieve a new state-of-the-art ImageNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.

Does CLIP’s generalization performance mainly stem from high train-test similarity?

Published in ICLR, 2024

Foundation models like CLIP are trained on hundreds of millions of samples and effortlessly generalize to new tasks and inputs. Out of the box, CLIP shows stellar zero-shot and few-shot capabilities on a wide range of out-of-distribution (OOD) benchmarks, which prior works attribute mainly to today’s large and comprehensive training dataset (like LAION). However, it is questionable how meaningful terms like out-of-distribution generalization are for CLIP as it seems likely that web-scale datasets like LAION simply contain many samples that are similar to common OOD benchmarks originally designed for ImageNet. To test this hypothesis, we retrain CLIP on pruned LAION splits that replicate ImageNet’s train-test similarity with respect to common OOD benchmarks. While we observe a performance drop on some benchmarks, surprisingly, CLIP’s overall performance remains high. This shows that high train-test similarity is insufficient to explain CLIP’s OOD performance, and other properties of the training data must drive CLIP to learn more generalizable representations. Additionally, by pruning data points that are dissimilar to the OOD benchmarks, we uncover a 100M split of LAION (1/4th of its original size) on which CLIP can be trained to match its original OOD performance.

publications_physics

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

workshops

1st Workshop on Test-Time Adaptation: Model, Adapt Thyself! (MAT), CVPR (2024)

Published:

In the MAT Workshop, we aim to unite researchers on adaptation and robustness to push the boundaries between training and testing. Our focus is on updating during deployment to maintain or improve accuracy, calibration, and fairness on changing data in diverse settings. Our scope encompasses data, evaluation, algorithms, and unresolved challenges for test-time updates while emphasizing unsupervised adaptation with minimal computational overhead. Special attention will be given to inventive approaches for adapting foundation models to new data, tasks, and deployments.