Skip to content

Discrepancy between threshold estimated by modkit summary and modkit pileup #548

@gsukrit

Description

@gsukrit

Dear Developers,

While running modkit (v0.6.0) on my madbam, I used two commands to summarize modification calls for my dataset. When i used modkit summary:
./modkit summary --no-sampling -t 8 /mod.bam > ./modbase_summary.tsv

the output for that sample is:

bases A,C

total_reads_used 1195280

count_reads_A 1195273

count_reads_C 1035131

pass_threshold_C 0.80859375

pass_threshold_A 0.8300781

base code pass_count pass_frac all_count all_frac
C - 6738209 0.8199451 7268915 0.7968498
C h 80783 0.009830154 260751 0.02858465
C m 1398886 0.17022473 1592398 0.17456554
A - 361980899 0.98579 390231615 0.9570983
A a 5217869 0.01420993 17492023 0.042901665

when I am using modkit pileup to get bedmethyl output (for base C):
./modkit pileup --cpg --combine-strands --modified-bases 5mC 5hmC --sampling-frac 1.0 --ref ./Genome.fasta -t 8 /mod.bam /Cpg.pileup.bed

the default thresholds are getting changed (and is much higher) even though I have disabled sampling reads option in both the commands.

discarded 0 contigs with zero aligned reads
parsed 2 base modification(s). Base modifictions other than 'C:h,C:m' will be counted as 'N_other'.
sampling 100% of reads
Using filter threshold 0.91015625 for C.
Using filter threshold 0.9433594 for A.
using optimized genome workers, combining strands at CpG motifs
Done, processed 333946 rows.

This is happening for nearly all samples. Can you please explain this discrepancy here ?
Also, to override this discrepancy, does it make sense to manually provide the threshold estimated from modkit summary for each base to modkit pileup command for each sample?

Looking forward to your response.

Thankyou!

Metadata

Metadata

Assignees

No one assigned

    Labels

    troubleshootingworkflow and data preparation questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions