-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hi,
why does modkit sample-probs report a different filtering threshold than modkit summary or modkit pileup?
These inconsistent thresholds cause modkit pileup to filter out much more methylated cytosines in my bedMethyl output for the Human Variation workflow. Thank you for looking into this.
Example (modkit v0.5.1) :
modkit sample-probs --only-mapped --region chr1 my.sorted.bam
sampling 10042 reads from BAM
base percentile threshold
C 10 0.9980469
C 50 1
C 90 1
modkit summary --mapped-only --region chr1 my.sorted.bam
sampling 10042 reads from BAM
calculating threshold at 10(th) percentile
calculated thresholds: C: 0.94921875
modkit pileup --region chr1 my.sorted.bam my.sorted_chr.bed
Using filter threshold 0.94921875 for C.
Update:
I realised that the Human Variation workflow is still using modkit v0.3.3, and in this version both modkit sample-probs and modkit summary overestimate the filtering threshold. The issue appears to be resolved for modkit summary in newer releases (at least in v0.5.1), but modkit sample-probs still has the same issue.