Dataset Distillation by Matching Training Trajectories
George Cazenavette 1
Tongzhou Wang 2
Antonio Torralba 2
Alexei A. Efros 3
Jun-Yan Zhu 1
1 Carnegie Mellon University
2 Massachusetts Institute of Technology
3 UC Berkeley

CVPR 2022 (Oral)



Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that optimizes our distilled data to guide networks to a similar state as those trained on real data across many training steps. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. To efficiently obtain the initial and target network parameters for large-scale datasets, we pre-compute and store training trajectories of expert networks trained on the real dataset. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.


The task of "Dataset Distillation" is to synthesize a low-support set of artificial data such that a model trained on this synthetic data alone will have similar test performance as a model trained on the full real dataset.

Our method significantly outperforms all previous methods (including DC [2], DSA [3], and DM [6]) on CIFAR and allows us to extend to higher-resolution datasets as well.


Our method distills the synthetic dataset by directly optimizing the fake images to induce similar network training dynamics as the full, real dataset. We train "student" networks for many iterations on the synthetic data, measure the error in parameter space between the "student" and "expert" networks, and back-propagate through all the student network updates to optimize the synthetic pixels.

Distillation Animation

First 1000 distillation iterations of CIFAR-100, 1 image per class.

Distilled ImageNet

For the first time, our method distills higher-resolution ImageNet classes.


Distilled Image Browsing

Explore all of our Distilled Images here

32x32 Images

CIFAR-10: 1, 10, 50 images/class CIFAR-100: 1, 10, 50 images/class
CIFAR-10 ZCA: 1, 10, 50 images/class CIFAR-100 ZCA: 1, 10, 50 images/class

64x64 Images

Tiny ImageNet: 1, 10, 50 images/class

128x128 Images

ImageFruit: 1, 10 images/class ImageSquawk: 1, 10 images/class
ImageWoof: 1, 10 images/class ImageMeow: 1, 10 images/class
ImageNette: 1, 10 images/class ImageBlub: 1, 10 images/class
ImageYellow: 1, 10 images/class

Dataset Downloads

You can find torch tensors containing all of our synthetic datasets here.


G. Cazenavette, T. Wang, A. Torralba, A.A. Efros, J.Y. Zhu
Dataset Distillation by Matching Training Trajectories
CVPR, 2022.


Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation

Follow-up work in the Workshop on Computer Vision for Fashion, Art, and Design at CVPR 2022


Instead of treating our synthetic data as individual images, we can instead encourage every random crop (with circular padding) on a larger canvas of pixels to induce a good training trajectory. This results in class-based textures that are continuous around their edges.

Given these tilable textures, we can apply them to areas that require such properties, such as clothing patterns.

Visualizations made using FAB3D


Distilled Texture Browsing

ImageSquawk: 1 Tile, 3x3 Tiles ImageFruit: 1 Tile, 3x3 Tiles

Workshop Paper

G. Cazenavette, T. Wang, A. Torralba, A.A. Efros, J.Y. Zhu
Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation
CVPR Workshop, 2022.


Related Work

  1. Tongzhou Wang et al. "Dataset Distillation", in arXiv preprint 2018
  2. Bo Zhao et al. "Dataset Condensation with Gradient Matching", in ICLR 2020
  3. Bo Zhao and Hakan Bilen. "Dataset Condensation with Differentiable Siamese Augmentation", in ICML 2021
  4. Timothy Nguyen et al. "Dataset Meta-Learning from Kernel Ridge-Regression", in ICLR 2021
  5. Timothy Nguyen et al. "Dataset Distillation with Infinitely Wide Convolutional Networks", in NeurIPS 2021
  6. Bo Zhao and Hakan Bilen. "Dataset Condensation with Distribution Matching", in arXiv preprint 2021
  7. Kai Wang et al. "CAFE: Learning to Condense Dataset by Aligning Features", in CVPR 2022


We would like to thank Alexander Li, Assaf Shocher, and Gokul Swamy for proofreading our paper. Kangle Deng, Ruihan Gao, Nupur Kumari, Muyang Li, Garuav Parmar, Chonghyuk Song, Sheng-Yu Wang, and Bingliang Zhang as well as Simon Lucey's Vision Group at the University of Adelaide also provided valuable feedback. This work is supported, in part, by the NSF Graduate Research Fellowship under Grant No. DGE1745016, and a grant from from SAP. This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.