This repository contains the code for the paper "Control-Oriented Clustering of Visual Latent Representations". It provides training, evaluation, and analysis pipelines for studying Neural Collapse (NC) in vision-based control policies, as well as control-oriented pretraining strategies.
======================================================================
Create the conda environment using the provided configuration file:
conda env create -f environment.yml
The code is developed and tested with:
- Python: 3.12.5
- CUDA: 12.4
Make sure your system configuration is compatible.
======================================================================
- Train a Vision-Based Control Policy
We provide an example implementation using:
- ResNet-18 as the vision encoder
- Diffusion model as the action decoder
The corresponding dataset used in the paper is also included.
Run the following command in the training folder:
python train_model.pyNotes:
- The default setting trains the model for 300 epochs, which matches the configuration used in the paper.
- Model checkpoints are automatically saved and later used for Neural Collapse evaluation.
- Test the Trained Model
Run the following command in the evaluation_test_score folder:
python test_domain_18_model.pyNotes:
- By default, the script evaluates checkpoints from 20 different epochs across the full 300-epoch training process.
- Evaluate Neural Collapse (NC)
Two classification (labeling) strategies are provided, as described in the paper.
a. Goal-Based Classification (Input Space)
Run the following command in the observe_NC_metric_input_space_labeling folder:
python domain_18_observe_NC_metric_input_space_labeling.pyThis script computes NC metrics for saved checkpoints across different training epochs.
b. Action-Based Classification (Action Space)
Run the following command in the observe_NC_metric_action_intention_labeling folder:
python domain_18_observe_NC_metric_action_intention_labeling.pyThis script evaluates NC metrics for saved checkpoints at different epochs.
======================================================================
Control-oriented pretraining code is provided in the NC_pretraining folder.
Step 1: Pretrain the vision encoder
python NC_pretrain.pyStep 2: End-to-end training of the vision encoder and diffusion model
python NC_together.pyThis two-stage training strategy follows the procedure described in the paper.
======================================================================
We release the Letters Planar Pushing dataset used in this paper. The dataset is publicly available at: https://github.com/han20192019/Letters-Planar-Pushing-Dataset
======================================================================
If you find this dataset useful, please cite:
@inproceedings{qi25iclr-control,
title={Control-oriented Clustering of Visual Latent Representation},
author={Qi, Han and Yin, Haocheng and Yang, Heng},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025},
note={\url{https://arxiv.org/abs/2410.05063}, \url{https://computationalrobotics.seas.harvard.edu/ControlOriented_NC/}}
}