GLaD :

Generalizing Dataset Distillation
Deep Generative Prior

CVPR 2023

Massachusetts Institute of Technology, UC Berkeley, Carnegie Mellon University

Dataset Distillation concerns the synthesis of small synthetic datasets that still lead to models with good test performance.

Generative Latent Distillation (GLaD) distills images into the latent space of a generative model rather than directly into pixels.

Distilled Images

Images distilled into latent space (top) rather than pixel space (bottom) are far less noisy with much clearer visual structure. Visualized below are results using the Trajectory Matching, Gradient Matching, and Distribution Matching methods distilled into latent space (ours) versus pixel space.

Trajectory Matching [1]

Generative Latent Distillation (Ours)
Pixel Distillation

Gradient Matching [2]

Generative Latent Distillation (Ours)
Pixel Distillation

Distribution Matching [3]

Generative Latent Distillation (Ours)
Pixel Distillation

Out of Distribution Generators

Our method is still effective using generators that have not been trained on the corresponding dataset. Visualized below are videos showing the process of distilling ImageNet subsets into generators trained on FFHQ, Pokémon, and nothing at all.

Extra-High Resolution

The failures of pixel distillation become even more apparent at extra-high resolutions. Attempting to distill into a high-resolution pixel space produces images almost completely full of high-frequency noise.


  author    = {Cazenavette, George and Wang, Tongzhou and Torralba, Antonio and Efros, Alexei A. and Zhu, Jun-Yan},
  title     = {Generalizing Dataset Distillation via Deep Generative Prior},
  journal   = {CVPR},
  year      = {2023},


[1] Cazenavette et al., Dataset Distillation by Matching Training Trajectories (CVPR 2022)
[2] Zhao et al., Dataset Condensation by Gradient Matching (ICLR 2021)
[3] Zhao and Bilen, Dataset Condensation by Distribution Matching (NeurIPS Workshop 2022)