amrfinder - Compute allelically methylated regions (AMRs)
Synopsis
$ dnmtools amrfinder [OPTIONS] <input.epiread>
Description
The program amrfinder scans the genome using a sliding window to
identify AMRs. For a genomic interval, two statistical models are
fitted to the reads mapped, respectively. One model (single-allele
model) assumes the two alleles have the same methylation state, and
the other (two-allele model) represents different methylation states
for the two alleles. Comparing the likelihood of the two models, the
interrogated genomic interval may be classified as an AMR. The
following command shows an example to run the program amrfinder and
takes as input an epireads file generated from
stats.
$ dnmtools amrfinder -c /path/to/genome.fa -o output.amr input.epiread
There are several options for running amrfinder.
-
The
-bswitches from using a likelihood ratio test to BIC as the criterion for calling an AMR. -
The
-ioption changes the number of iterations used in the EM procedure when fitting the models. -
The
-woption changes the size of the sliding window, which is in terms of CpGs. The default of 10 CpGs per window has worked well for us. -
The
-mindicates the minimum coverage per CpG site required for a window to be tested as an AMR. The default requires 4 reads on average, and any lower will probably lead to unreliable results. -
The
-gparameter is used to indicate the maximum distance between any two identified AMRS; AMRs are often fragmented, as coverage fluctuates, and spacing between CpGs means their linkage cannot be captured by the model. if two are any closer than this value, they are merged. The default is 1000, and it seems to work well in practice, not joining things that appear as though they should be distinct. In the current version of the program, at the end of the procedure, any AMRs whose size in terms of base-pairs is less than half the "gap" size are eliminated. This is a hack that has produced excellent results, but will eventually be eliminated (hopefully soon). -
The
-Cparameter specifies the critical value for keeping windows as AMRs, and is only useful when the likelihood ratio test is the used; for BIC windows are retained if the BIC for the two-allele model is less than that for the single-allele model. amrfinder calculates a false discovery rate to correct for multiple testing, and therefore most p-values that pass the test will be significantly below the critical value. -
The
-hoption produces FDR-adjusted p-values according to a step-up procedure and then compares them directly to the given critical value, which allows further use of the p-values without multiple testing correction. -
The
-fomits multiple testing correction entirely by not applying a correction to the p-values or using a false discovery rate cutoff to select AMRs.
Options
-o, -output
The name of the output file. If no file name is provided, the output will be written to standard output. Due to the size of this output, a file should be specified unless the output will be piped to another command or program. The output file contains genomic intervals in BED format.
-c, -chrom
FASTA file or directory of chromosomes containing FASTA files. This parameter is required.
-i, -itr
The maximum number of iterations when training (default: 10).
-w, -window
Size of sliding window (default: 10 CpG sites).
-m, -min-cov
Minimum coverage per CpG to test in each window (default: 4).
-g, -gap
Minimum allowed gap, in bp, between AMRs (default: 1000).
-C, -crit
Critical p-value cutoff (default: 0.01).
-f, -nofdr
Omits the FDR multiple testing correction.
-h, -pvals
Adjusts p-values using Hochberg step-up.
-b, -bic
Use Bayesian Information Criterion (BIC) to compare models.
-v, -verbose
Print more information while the command is running.
-P, -progress
Print progress info while the command is running.