Skip to content

Discrepancy between threshold estimated by modkit summary and modkit pileup and modkit sample-probs #550

@guanhomer

Description

@guanhomer

Hi,
why does modkit sample-probs report a different filtering threshold than modkit summary or modkit pileup?
These inconsistent thresholds cause modkit pileup to filter out much more methylated cytosines in my bedMethyl output for the Human Variation workflow. Thank you for looking into this.

Example (modkit v0.5.1) :

modkit sample-probs --only-mapped --region chr1 my.sorted.bam

sampling 10042 reads from BAM
base percentile threshold
C 10 0.9980469
C 50 1
C 90 1

modkit summary --mapped-only --region chr1 my.sorted.bam

sampling 10042 reads from BAM
calculating threshold at 10(th) percentile
calculated thresholds: C: 0.94921875

modkit pileup --region chr1 my.sorted.bam my.sorted_chr.bed

Using filter threshold 0.94921875 for C.

Update:
I realised that the Human Variation workflow is still using modkit v0.3.3, and in this version both modkit sample-probs and modkit summary overestimate the filtering threshold. The issue appears to be resolved for modkit summary in newer releases (at least in v0.5.1), but modkit sample-probs still has the same issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions