Dataset Distillation by Matching Training Trajectories

George Cazenavette ¹

Tongzhou Wang ²

Antonio Torralba ²

Alexei A. Efros ³

Jun-Yan Zhu ¹

¹ Carnegie Mellon University

² Massachusetts Institute of Technology

³ UC Berkeley

CVPR 2022 (Oral)

[Paper]

[GitHub]

Abstract

Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that optimizes our distilled data to guide networks to a similar state as those trained on real data across many training steps. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. To efficiently obtain the initial and target network parameters for large-scale datasets, we pre-compute and store training trajectories of expert networks trained on the real dataset. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.

Problem

The task of "Dataset Distillation" is to synthesize a low-support set of artificial data such that a model trained on this synthetic data alone will have similar test performance as a model trained on the full real dataset.

Our method significantly outperforms all previous methods (including DC [2], DSA [3], and DM [6]) on CIFAR and allows us to extend to higher-resolution datasets as well.

Method

Our method distills the synthetic dataset by directly optimizing the fake images to induce similar network training dynamics as the full, real dataset. We train "student" networks for many iterations on the synthetic data, measure the error in parameter space between the "student" and "expert" networks, and back-propagate through all the student network updates to optimize the synthetic pixels.

Distillation Animation

First 1000 distillation iterations of CIFAR-100, 1 image per class.

Distilled ImageNet




For the first time, our method distills higher-resolution ImageNet classes.

Video

Distilled Image Browsing

Explore all of our Distilled Images here

32x32 Images
CIFAR-10: 1, 10, 50 images/class	CIFAR-100: 1, 10, 50 images/class
CIFAR-10 ZCA: 1, 10, 50 images/class	CIFAR-100 ZCA: 1, 10, 50 images/class
64x64 Images
Tiny ImageNet: 1, 10, 50 images/class
128x128 Images
ImageFruit: 1, 10 images/class	ImageSquawk: 1, 10 images/class
ImageWoof: 1, 10 images/class	ImageMeow: 1, 10 images/class
ImageNette: 1, 10 images/class	ImageBlub: 1, 10 images/class
ImageYellow: 1, 10 images/class

Dataset Downloads

You can find torch tensors containing all of our synthetic datasets here.

Paper

G. Cazenavette, T. Wang, A. Torralba, A.A. Efros, J.Y. Zhu
Dataset Distillation by Matching Training Trajectories
CVPR, 2022.

[Bibtex]

Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation
Follow-up work in the Workshop on Computer Vision for Fashion, Art, and Design at CVPR 2022
[PDF]

Instead of treating our synthetic data as individual images, we can instead encourage every random crop (with circular padding) on a larger canvas of pixels to induce a good training trajectory. This results in class-based textures that are continuous around their edges.

Given these tilable textures, we can apply them to areas that require such properties, such as clothing patterns.

Visualizations made using FAB3D

Video

Distilled Texture Browsing

ImageSquawk: 1 Tile, 3x3 Tiles

ImageFruit: 1 Tile, 3x3 Tiles

Workshop Paper

G. Cazenavette, T. Wang, A. Torralba, A.A. Efros, J.Y. Zhu
Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation
CVPR Workshop, 2022.

[Bibtex]

Related Work

Tongzhou Wang et al. "Dataset Distillation", in arXiv preprint 2018
Bo Zhao et al. "Dataset Condensation with Gradient Matching", in ICLR 2020
Bo Zhao and Hakan Bilen. "Dataset Condensation with Differentiable Siamese Augmentation", in ICML 2021
Timothy Nguyen et al. "Dataset Meta-Learning from Kernel Ridge-Regression", in ICLR 2021
Timothy Nguyen et al. "Dataset Distillation with Infinitely Wide Convolutional Networks", in NeurIPS 2021
Bo Zhao and Hakan Bilen. "Dataset Condensation with Distribution Matching", in arXiv preprint 2021
Kai Wang et al. "CAFE: Learning to Condense Dataset by Aligning Features", in CVPR 2022

Acknowledgements

We would like to thank Alexander Li, Assaf Shocher, and Gokul Swamy for proofreading our paper. Kangle Deng, Ruihan Gao, Nupur Kumari, Muyang Li, Garuav Parmar, Chonghyuk Song, Sheng-Yu Wang, and Bingliang Zhang as well as Simon Lucey's Vision Group at the University of Adelaide also provided valuable feedback. This work is supported, in part, by the NSF Graduate Research Fellowship under Grant No. DGE1745016, and a grant from from SAP. This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.

Abstract

Problem

Method

Distillation Animation

Distilled ImageNet

Video

Distilled Image Browsing

32x32 Images

64x64 Images

128x128 Images

Dataset Downloads

Paper

Video

Distilled Texture Browsing

Workshop Paper

Related Work

Acknowledgements