deep-anytime-testing

Setup the environment and install packages

python3 -m venv my_env #  conda create --name my_env (if you use conda)
source my_env/bin/activate # conda activate my_env
pip install -r requirements.txt

We also recommend to install wandb for logging. See https://docs.wandb.ai/quickstart for more details. We use wandb for logging the training process and storing the test statistics.

2.Datasets

Blob Dataset: The Blob dataset is a two-dimensional Gaussian mixture model with nine modes arranged on a 3 x 3 grid. The two distributions differ in their variance as visualized in the figure below.

MNIST: The MNIST dataset consists of hand-written digits.

CIFAR10: The CIFAR10 dataset consists of 10 classes of images. We fine-tuned a ResNET50 model on the CIFAR10 dataset. We saved the model under data/cifar10/best_model.pth. It is used for generating adversarial samples with FGSM.

Gaussian CIT: The Gaussian CIT dataset is a 20-dimensional hierachical Gaussian model used for the conditional independence test.

Structure

train.py contains the training pipeline for each DAVT experiment. Here, all objects are initialized by making use of hydra's config files (see folder configs) and the training is performed. The training pipeline consists of the following steps:
- Initialize the data generator (e.g. Blob dataset) This class yields samples from two distributions. The output object is of class torch.Dataset. Independently on the dataset there are two parameters that should be specified
  - samples: number of samples to be generated
  - type: type of the experiment "type2" (alternative holds) "type11" (null holds and the data comes from the first class), "type12" (null holds and the data comes from the second class)
  The configuration files for each data generator are in folder configs/data, e.g. the file blob.yaml contains
```
_target_: "data.blob.BlobDataGen" # the class to be initialized
samples: 1000 # mandatory parameter
type: "type2" # mandatory parameter
r: 3 # dataset specific parameter
d: 2 # dataset specific parameter
rho: 0.03 # dataset specific parameter
with_labels: false # dataset specific parameter
data_seed: 0 # dataset specific seed
```
- Initialize the operators. An object of class Operator (operators/base/Operator). An example is the SwapOperator (operators/swap/SwapOperator) which swaps the last feature of the input. The operator is initialized with the following parameters
  - swap operator: $\tau(x,y) = (y,x)$ and $\tau((x_1,y_1),(x_2, y_2)) = ((x_1,y_2),(x_2, y_1))$. As for the data generator the default initialization parameters for the SwapOperator (operators/swap/SwapOperator) are stored in a hydra config file: configs/operator/swap.yaml
```
_target_: "operators.swap.SwapOperator"
p: 3 # tau input dimension (e.g. the number of features in either x or y in the two-sample test or the total number of feature is x_1 and y_1 ( or x_2 and y_2))
d: 2 # the starting index for swapping (e.g. here we swap the last feature of the 3D input) 
```
- Initialize the model. The first model is a simple MLP with number and dimensions of hidden layers specified by the user. See the corresponding config files model/mlp.yaml for the default parameters. To build an MLP with four hidden layers with size 40 the user should specify hidden_layer_size: [40, 40, 40, 40].
- Initialize and perform the training This step initializes a trainer object of class trainer.trainer.Trainer with configuration which can be specified in (configs/config.yaml)
Important: The user should specify all needed parameters and store them in the configs/experiment/experiment.yaml file! An example (blob-swap-projection.yaml) is given below:

```
# @package _global_
defaults:
  - /data: blob # load the default parameters for the blob generator
  - /model: mlp # load the default parameters for the DNN model, here we use mlp
  - /operator@tau1: # load the default parameters for the first operator
      - projection
  - /operator@tau2: # load the default parameters for the first operator, the second operator is a composition \Tau_{swap}\circ\Tau_{proj}
      - swap
      - projection

project: "Test" # project name for wandb

tau1: # here you can overwrite the defaults for the first operator
  projection:
    input_dim: 0

tau2: # here you can overwrite the defaults for the second operator
  swap:
    p: 2
    d: 0
  projection:
    input_dim: 0

model: # here you can overwrite the defaults for the model
  input_size: 2
  hidden_layer_size: [30, 30]
  bias: true

data:
  data_seed: 10
  samples: 90
  type: "type2"
  with_labels: false

train: # you always need to specify the training parameters
  name: "davt" # this is important for the baseline training, see examples in 5.
  seed: 0 # seed for the training
  lr: 0.0005 # learning rate
  earlystopping: # early stopping parameters
    patience: 10
    delta: 0.0
  epochs: 500 # max number of epochs
  seqs: 30 # number of mini-batches
  T: 0 # Warm start number of mini-batches used for the training only
  alpha: 0.05 # significance level
  batch_size: 90 # batch size
  save: false # save the model, not implemented yet
  save_dir: ""
  l1_lambda: 0.0 # l1 regularization parameter
  l2_lambda: 0.0 # l2 regularization parameter
```

Experiments With the following lines you can run each experiment for a single seed. For example, power experiments can be run with the following commands:

# Blob dataset, type 2, DAVT swap,projection and DNN baselines
python train.py experiment=blob-swap-projection data.type="type2" 
python train_baselines.py experiment=blob-two-sample-baselines data.type="type2" train.name="deep"

# Blob dataset, type 2, DAVT swap and DNN baselines
python train.py experiment=blob-swap data.type="type11" # or type12
python train_baselines.py experiment=blob-two-sample-baselines data.type="type11" train.name="deep"

# CIFAR10 dataset, type 2, please specify the path to the model in the config file cifar10-aa-davt.yaml and cifar10-aa-baselines.yaml
python train.py experiment=cifar10-aa-davt data.type="type2" 
python train.py experiment=cifar10-aa-baselines train.name="deep" data.type="type2" 

# CIT experiment, type 1
python train.py experiment=gaussian-cit-davt data.type="type1" 
python train.py experiment=gaussian-cit-baseline train.name="ecrt" # baseline ECRT

# Rotated MNIST experiment, p=0.5
python train.py experiment=mnist-rotation-davt data.p=0.5
python train.py experiment=mnist-rotation-baselines train.name="rand" data.p=0.5 # baseline SC2ST with multiple testing corrections

    ```

If you want to run for several seeds use the run.ssh file.

To retrieve the results from wandb use the prepared notebooks in the folder notebooks.

Figures The figures are generated with the notebook Plot-DAVT.ipynb in the folder notebooks with the data stored in the figures/data folder. The figures are stored in the folder figures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

deep-anytime-testing

Important: The user should specify all needed parameters and store them in the configs/experiment/experiment.yaml file! An example (blob-swap-projection.yaml) is given below:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
baselines		baselines
configs		configs
data		data
figures		figures
models		models
notebooks		notebooks
operators		operators
trainer		trainer
.gitignore		.gitignore
README.md		README.md
load_results.py		load_results.py
requirements.txt		requirements.txt
run.ssh		run.ssh
train.py		train.py
train_baselines.py		train_baselines.py

he-zh/deep-anytime-testing

Folders and files

Latest commit

History

Repository files navigation

deep-anytime-testing

Important: The user should specify all needed parameters and store them in the configs/experiment/experiment.yaml file! An example (blob-swap-projection.yaml) is given below:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages