Skip to content

Conversation

@lucie271
Copy link
Contributor

@lucie271 lucie271 commented Nov 3, 2025

Overview

This PR introduces the BIDS conversion pipeline for the SAILS dataset.
It automates video processing, stabilization, and conversion from raw videos into a standardized BIDS-compatible structure.

The main script for the BIDS conversion is located at:

src/BIDS_convertor.py

and corresponding tests are available in:

src/tests/test_BIDS_convertor.py


Requirements

Before running the pipeline, please ensure:

  • Poetry is installed (see instructions in the project’s README.md)
  • FFmpeg ≥ 6.0 is available in your environment
  • All Python dependencies are installed via Poetry (poetry install)

For detailed setup instructions, refer to the README.


Usage

To launch the BIDS conversion process, use the submission scripts (located in the folder jobs), with dependency as follows :

jid=$(sbatch --parsable jobs/run_bids_convertor.sh)
sbatch --dependency=afterok:$jid jobs/merge_cleanup.sh

Input and Output

  • Input videos:

    /orcd/data/satra/002/datasets/SAILS/Phase_III_Videos/Videos_from_external_standardized

  • Output (final BIDS dataset):

    /orcd/scratch/bcs/001/sensein/sails/BIDS_data

The pipeline automatically:

  • Extracts and stabilizes raw videos
  • Converts metadata into BIDS format
  • Generates standardized folder structure and derivative files
  • Produces per-task processing logs and merged summaries

@lucie271 lucie271 requested a review from fabiocat93 November 3, 2025 15:35
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lucie271, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a robust and automated BIDS conversion pipeline for the SAILS dataset. It streamlines the entire process from raw video ingestion to the generation of a fully BIDS-compliant dataset, complete with preprocessed videos, extracted audio, and rich metadata. The new architecture supports parallel execution via Slurm, making it highly scalable for large-scale video data processing and ensuring data standardization for future research.

Highlights

  • BIDS Conversion Pipeline: Introduced a new, comprehensive pipeline to automate the conversion of raw SAILS dataset videos into a standardized BIDS-compatible structure. This includes video processing, stabilization, and metadata conversion.
  • Parallel Processing with Slurm: Implemented Slurm array job scripts (jobs/run_bids_convertor.sh and jobs/merge_cleanup.sh) to enable efficient, parallel processing of video chunks, significantly speeding up the conversion for large datasets.
  • Enhanced Video Processing: The core BIDS_convertor.py script has been extensively refactored to include robust video stabilization (using vidstab), denoising, and standardization, along with improved error handling for FFmpeg operations.
  • Dynamic Metadata and Event File Generation: Improved metadata handling, including dynamic task information in JSON sidecar files and a more comprehensive events.tsv file that incorporates all available behavioral coding data from Excel, or generates dummy data if not available.
  • Dependency Updates and Configuration: Updated project dependencies (poetry.lock, pyproject.toml) to include pandas, opencv-python, openpyxl, and types-pyyaml. A new configuration file (configs/config_bids_convertor.yaml) centralizes pipeline settings.
  • Documentation and Usage Instructions: The README.md has been updated with detailed instructions on setting up the Poetry environment, installing FFmpeg with vidstab, and launching the BIDS conversion process using the provided Slurm scripts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive BIDS conversion pipeline. The changes represent a significant refactoring, moving towards a more modular, robust, and parallelizable architecture suitable for a cluster environment. The code quality is generally high, with substantial improvements in error handling and data processing logic. However, there are a few critical and high-severity issues, primarily a hardcoded path in a shell script that severely impacts portability, and some unsafe file operations. I've also included several medium-severity suggestions to improve maintainability, robustness, and test correctness.

Comment on lines 273 to 274
if participant_id.endswith(" 2"):
participant_id = participant_id[:-2].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This hardcoded logic to handle participant IDs ending in " 2" is very specific to the current dataset's quirks. This is fragile and may not be obvious to future maintainers. Consider making this logic more generic or configurable, or at least add a comment explaining why this specific transformation is necessary.

lucie271 and others added 4 commits November 3, 2025 10:41
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@lucie271 lucie271 linked an issue Nov 3, 2025 that may be closed by this pull request
Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @lucie271 ! I have left some comments here and there

dataset_desc = {
"Name": "SAILS Phase III Home Videos",
"BIDSVersion": "1.9.0",
"DatasetType": "domestic videos with audio",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should simply be "raw" to match with the BIDS rules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually might be considered preprocessed if using the standardized folder since those were put through ffmpeg and converted to mp4s to try and standardize all of the filetypes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.. Would it be better to work with the raw videos folder (not standardized) then ? As the BIDS conversion is also taking care of the mp4 conversion and put through ffmpeg ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

README.md Outdated
- Things
- These may include a wonderful CLI interface.

## Installation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add your setup info in here (poetry, ffmpeg, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

pip install git+https://github.com/sensein/sailsprep.git
```

## Quick start
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add in here a list/table of things you can do with sailsprep. that list will include mostly/only your bids_convertor for now. and you can create a link to a second .md file where you explain the details of what that is, how to use it, ...
This way, you don't make the readme.md too crowded but you still promote your contribution and make it findable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Copy link

@wilke0818 wilke0818 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good. mostly just address what fabio has mentioned. if you wanted to clean up teh code a little bit by factoring out some of the BIDs file creation you could look here as an example of how it was done for Bridge2AI with the audio files and such:
https://github.com/sensein/b2aiprep/tree/5ece6c33bd744a3a2b8e578f50c33af8d9e913d0/src/b2aiprep/prepare/resources/b2ai-data-bids-like-template

dataset_desc = {
"Name": "SAILS Phase III Home Videos",
"BIDSVersion": "1.9.0",
"DatasetType": "domestic videos with audio",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually might be considered preprocessed if using the standardized folder since those were put through ffmpeg and converted to mp4s to try and standardize all of the filetypes

@fabiocat93
Copy link
Collaborator

@lucie271 I have updated some of the dev dependencies as dependabot was suggesting (to fix some vulnerabilities and more). This shouldn't create any big conflict in pyproject.toml and should be easily solvable, but wanted to let you know so that you are aware. if you need any help, please feel free to ask

@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 78.94737% with 200 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.03%. Comparing base (9eea44a) to head (78e611e).

Files with missing lines Patch % Lines
src/sailsprep/BIDS_convertor.py 66.89% 194 Missing ⚠️
src/tests/test_BIDS_convertor.py 98.35% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #29       +/-   ##
===========================================
+ Coverage   65.59%   80.03%   +14.44%     
===========================================
  Files           5        5               
  Lines         529     1052      +523     
===========================================
+ Hits          347      842      +495     
- Misses        182      210       +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lucie271 lucie271 removed the request for review from yibeichan November 4, 2025 23:09
Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job!

@fabiocat93 fabiocat93 merged commit f098f57 into main Nov 6, 2025
6 checks passed
@lucie271 lucie271 deleted the BIDS-conversion branch November 24, 2025 14:52
lucie271 pushed a commit that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR for BIDS-conversion

7 participants