-
Notifications
You must be signed in to change notification settings - Fork 240
Bombcell integration #4306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bombcell integration #4306
Conversation
…s and add more template metrics
…verlay and histograms
…verlay and histograms
…uration, add amplitude_median, bombcell_snr and fix non-somatic classification rules
for more information, see https://pre-commit.ci
… for name changes
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ve template and quality metrics (this way it is clear what to input)
for more information, see https://pre-commit.ci
|
Salut Julie, I will be back with more carefully reading. But some main stuff:
|
| import numpy as np | ||
| import warnings | ||
| from copy import deepcopy | ||
| from scipy.signal import find_peaks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to the function?
The core module has minimal dependencies, and all additional imports should be local :)
| @@ -0,0 +1,430 @@ | |||
| """ | |||
| Unit labelling based on quality metrics (Bombcell). | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Unit labelling based on quality metrics (Bombcell). | |
| Unit labeling based on quality metrics (Bombcell). |
In general, we adopted american english (@chrishalcrow is not happy about it!).
Could you rename this and the files to labeling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file could be called bombcell_curation (similar to model_based_curation)
| @@ -0,0 +1,74 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this file
alejoe91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Julie-Fabre massive effort! Thanks!
I did a first round of reviewing and I'm happy to discuss some details and also work on it :)
| from typing import Optional | ||
|
|
||
|
|
||
| WAVEFORM_METRICS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| WAVEFORM_METRICS = [ | |
| NOISE_METRICS = [ |
?
| # bombcell | ||
| return { | ||
| # Waveform quality (failures -> NOISE) | ||
| "num_positive_peaks": {"min": np.nan, "max": 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "num_positive_peaks": {"min": np.nan, "max": 2}, | |
| "num_positive_peaks": {"min": None, "max": 2}, |
I would just keep None and deal with it in the function instead of NaN, so you can save/load to JSON without any custom fields
| quality_metrics=None, | ||
| template_metrics=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use sorting_analyzer instead
| unit_type_string : np.ndarray | ||
| String labels. | ||
| """ | ||
| combined_metrics = _combine_metrics(quality_metrics, template_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| combined_metrics = _combine_metrics(quality_metrics, template_metrics) | |
| combined_metrics = sorting_analyzer.get_metrics_extension_data() |
;)
| values = np.abs(values) | ||
| thresh = thresholds[metric_name] | ||
| noise_mask |= np.isnan(values) | ||
| if not np.isnan(thresh["min"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not np.isnan(thresh["min"]): | |
| if thresh["min"] is not None: |
and so on
| class PeakToValley(BaseMetric): | ||
| metric_name = "peak_to_valley" | ||
| class PeakToTroughDuration(BaseMetric): | ||
| metric_name = "peak_to_trough_duration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking it could be useful to add a deprecated_column_names, so we could automate backward compatibility :)
| num_positive_peaks_dict = {} | ||
| num_negative_peaks_dict = {} | ||
| sampling_frequency = sorting_analyzer.sampling_frequency | ||
| sampling_frequency = tmp_data["sampling_frequency"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goooooood catch @Julie-Fabre !!!!!!
| class WaveformDuration(BaseMetric): | ||
| metric_name = "waveform_duration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the name doesn't convey the actual computation
| class WaveformDuration(BaseMetric): | |
| metric_name = "waveform_duration" | |
| class MainToNextPeakDuration(BaseMetric): | |
| metric_name = "main_to_next_peak_duration" |
?
| "trough_width": "Width of the main trough in microseconds", | ||
| "peak_before_width": "Width of the main peak before trough in microseconds", | ||
| "peak_after_width": "Width of the main peak after trough in microseconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be consistent and output everything in the same unit. For now we have been doing seconds for the durations. The bombcell curation could still accept thresholds in us and do the conversion on the fly.
Alternatively, we could add a unit field to the BaseMetric, to specify units for each column. I think I would go with this, but it requires an additional refactoring. @chrishalcrow what do you think?
| quality_metrics=None, | ||
| template_metrics=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| quality_metrics=None, | |
| template_metrics=None, | |
| sorting_analyzer |
same reasons as curation module
This PR ports bombcell-style unit classification to SpikeInterface.
Template metrics
get_trough_and_peak_idx()function that usesscipy.signal.find_peaks(). Since SpikeInterface stores templates based on raw data rather than the heavily smoothed templates used in template matching, the waveforms can be noisy—so you can optionally apply Savitzky-Golay smoothing before detection. The function returns dicts for troughs, peaks before, and peaks after, each containing indices, values, prominences, and widths.New metrics:
peak_before_to_trough_ratio,peak_after_to_trough_ratio,waveform_baseline_flatness,peak_before_width,trough_width,main_peak_to_trough_ratio.Renamed
peak_to_valleytopeak_to_trough_duration.Quality metrics
snr_bombcell—peak amplitude over baseline MAD.amplitude_cutoffnow has parameters for controlling the histogram fitting:Unit classification
spikeinterface.curation:Units get classified as NOISE → MUA → GOOD based on successive threshold checks. Optional NON_SOMA category for non-somatic waveforms.
Plots
or a wrapper for all plots: