-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Dear Developers,
While running modkit (v0.6.0) on my madbam, I used two commands to summarize modification calls for my dataset. When i used modkit summary:
./modkit summary --no-sampling -t 8 /mod.bam > ./modbase_summary.tsv
the output for that sample is:
bases A,C
total_reads_used 1195280
count_reads_A 1195273
count_reads_C 1035131
pass_threshold_C 0.80859375
pass_threshold_A 0.8300781
base code pass_count pass_frac all_count all_frac
C - 6738209 0.8199451 7268915 0.7968498
C h 80783 0.009830154 260751 0.02858465
C m 1398886 0.17022473 1592398 0.17456554
A - 361980899 0.98579 390231615 0.9570983
A a 5217869 0.01420993 17492023 0.042901665
when I am using modkit pileup to get bedmethyl output (for base C):
./modkit pileup --cpg --combine-strands --modified-bases 5mC 5hmC --sampling-frac 1.0 --ref ./Genome.fasta -t 8 /mod.bam /Cpg.pileup.bed
the default thresholds are getting changed (and is much higher) even though I have disabled sampling reads option in both the commands.
discarded 0 contigs with zero aligned reads
parsed 2 base modification(s). Base modifictions other than 'C:h,C:m' will be counted as 'N_other'.
sampling 100% of reads
Using filter threshold 0.91015625 for C.
Using filter threshold 0.9433594 for A.
using optimized genome workers, combining strands at CpG motifs
Done, processed 333946 rows.
This is happening for nearly all samples. Can you please explain this discrepancy here ?
Also, to override this discrepancy, does it make sense to manually provide the threshold estimated from modkit summary for each base to modkit pileup command for each sample?
Looking forward to your response.
Thankyou!