MIEO - Masked Input Encoded Output

Self-supervised autoencoder for structured clinical data that handles missing values via input masking and a type-aware loss. Embeddings feed a downstream ANN to predict cardiovascular death within 8 years on an IHD cohort.

About

The primary objective of this project is to develop an efficient method for representing complex clinical data in a lower-dimensional space. By leveraging autoencoders, we aim to generate embeddings that capture the underlying structure of the data while preserving important information about the patients' health status. Our ultimate goal is to handle datasets with missing data and learn to perform imputation, creating consistent embeddings even for patients with missing data.

Conference

Our poster for MIEO was displayed at the 20th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2025), held at Politecnico di Milano, Milan, Italy on 10–12 September 2025.
View the poster (PDF) · CIBB 2025 website / program

Dataset

The main dataset used in this project is OrmoniTiroidei3Aprile2024.xlsx, which contains real clinical data related to thyroid disorders. Additionally, we augment the main dataset with additional data from other datasets pertaining to the same patients, enhancing the richness and diversity of the data.

Methodology

We utilize an autoencoder architecture to encode the clinical data into a lower-dimensional space. The autoencoder comprises an encoder network that compresses the input data into a latent space representation and a decoder network that reconstructs the original input from the encoded representation. By training the autoencoder on the augmented dataset, we aim to learn meaningful embeddings that capture the essence of the patients' health data. We focus on handling missing data and aim to learn imputation techniques to create consistent embeddings for patients with missing data.

Evaluation

To evaluate the effectiveness of our embeddings, we employ them as features to train a neural network for a classification task. The targets for the classification task are specified within the Cleaning_Data.ipynb notebook, where we also perform data cleaning and preprocessing.

Authors

Angelo Nardone
Davide Borghini
Davide Marchi
Giordano Scerra

Getting Started

To get started with the project, follow these steps:

Clone the repository to your local machine.
Install the necessary dependencies listed in the requirements.txt file.
Explore the codebase and experiment with different configurations and parameters.
Run the provided scripts to train the autoencoder on the augmented dataset and generate embeddings.
Use the embeddings as features to train a neural network for the classification task specified in the Cleaning_Data.ipynb notebook.

Modified library

To avoid .venv/lib/python3.8/site-packages/skorch/net.py:2231: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True.

We changed on skorch:
    From:   load_kwargs = {'map_location': map_location}
    To:     load_kwargs = {'map_location': map_location, 'weights_only':True}

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
CIBB Certificates		CIBB Certificates
Encoder_classifier		Encoder_classifier
Figures		Figures
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MIEO CIBB.pdf		MIEO CIBB.pdf
MIEO Poster.pdf		MIEO Poster.pdf
MIEO Presentation.pdf		MIEO Presentation.pdf
README.md		README.md
requirements.txt		requirements.txt
utilsData.py		utilsData.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MIEO - Masked Input Encoded Output

About

Conference

Dataset

Methodology

Evaluation

Authors

Getting Started

Modified library

About

Uh oh!

Releases 3

Uh oh!

Contributors 4

Uh oh!

Languages

License

davide-marchi/clinical-data-encoding

Folders and files

Latest commit

History

Repository files navigation

MIEO - Masked Input Encoded Output

About

Conference

Dataset

Methodology

Evaluation

Authors

Getting Started

Modified library

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors 4

Uh oh!

Languages