-
Notifications
You must be signed in to change notification settings - Fork 0
add BIDS Conversion pipeline #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @lucie271, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a robust and automated BIDS conversion pipeline for the SAILS dataset. It streamlines the entire process from raw video ingestion to the generation of a fully BIDS-compliant dataset, complete with preprocessed videos, extracted audio, and rich metadata. The new architecture supports parallel execution via Slurm, making it highly scalable for large-scale video data processing and ensuring data standardization for future research. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive BIDS conversion pipeline. The changes represent a significant refactoring, moving towards a more modular, robust, and parallelizable architecture suitable for a cluster environment. The code quality is generally high, with substantial improvements in error handling and data processing logic. However, there are a few critical and high-severity issues, primarily a hardcoded path in a shell script that severely impacts portability, and some unsafe file operations. I've also included several medium-severity suggestions to improve maintainability, robustness, and test correctness.
| if participant_id.endswith(" 2"): | ||
| participant_id = participant_id[:-2].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcoded logic to handle participant IDs ending in " 2" is very specific to the current dataset's quirks. This is fragile and may not be obvious to future maintainers. Consider making this logic more generic or configurable, or at least add a comment explaining why this specific transformation is necessary.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
fabiocat93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work @lucie271 ! I have left some comments here and there
src/BIDS_convertor.py
Outdated
| dataset_desc = { | ||
| "Name": "SAILS Phase III Home Videos", | ||
| "BIDSVersion": "1.9.0", | ||
| "DatasetType": "domestic videos with audio", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should simply be "raw" to match with the BIDS rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually might be considered preprocessed if using the standardized folder since those were put through ffmpeg and converted to mp4s to try and standardize all of the filetypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.. Would it be better to work with the raw videos folder (not standardized) then ? As the BIDS conversion is also taking care of the mp4 conversion and put through ffmpeg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
README.md
Outdated
| - Things | ||
| - These may include a wonderful CLI interface. | ||
|
|
||
| ## Installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add your setup info in here (poetry, ffmpeg, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
| pip install git+https://github.com/sensein/sailsprep.git | ||
| ``` | ||
|
|
||
| ## Quick start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add in here a list/table of things you can do with sailsprep. that list will include mostly/only your bids_convertor for now. and you can create a link to a second .md file where you explain the details of what that is, how to use it, ...
This way, you don't make the readme.md too crowded but you still promote your contribution and make it findable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
wilke0818
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. mostly just address what fabio has mentioned. if you wanted to clean up teh code a little bit by factoring out some of the BIDs file creation you could look here as an example of how it was done for Bridge2AI with the audio files and such:
https://github.com/sensein/b2aiprep/tree/5ece6c33bd744a3a2b8e578f50c33af8d9e913d0/src/b2aiprep/prepare/resources/b2ai-data-bids-like-template
src/BIDS_convertor.py
Outdated
| dataset_desc = { | ||
| "Name": "SAILS Phase III Home Videos", | ||
| "BIDSVersion": "1.9.0", | ||
| "DatasetType": "domestic videos with audio", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually might be considered preprocessed if using the standardized folder since those were put through ffmpeg and converted to mp4s to try and standardize all of the filetypes
|
@lucie271 I have updated some of the dev dependencies as dependabot was suggesting (to fix some vulnerabilities and more). This shouldn't create any big conflict in pyproject.toml and should be easily solvable, but wanted to let you know so that you are aware. if you need any help, please feel free to ask |
…into BIDS-conversion Merge pyproject.toml to changes for unit tests
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #29 +/- ##
===========================================
+ Coverage 65.59% 80.03% +14.44%
===========================================
Files 5 5
Lines 529 1052 +523
===========================================
+ Hits 347 842 +495
- Misses 182 210 +28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fabiocat93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good job!
add BIDS Conversion pipeline
Overview
This PR introduces the BIDS conversion pipeline for the SAILS dataset.
It automates video processing, stabilization, and conversion from raw videos into a standardized BIDS-compatible structure.
The main script for the BIDS conversion is located at:
and corresponding tests are available in:
Requirements
Before running the pipeline, please ensure:
README.md)poetry install)For detailed setup instructions, refer to the
README.Usage
To launch the BIDS conversion process, use the submission scripts (located in the folder
jobs), with dependency as follows :Input and Output
Input videos:
/orcd/data/satra/002/datasets/SAILS/Phase_III_Videos/Videos_from_external_standardizedOutput (final BIDS dataset):
/orcd/scratch/bcs/001/sensein/sails/BIDS_dataThe pipeline automatically: