dmr - differentially methylated regions

Synopsis

$ dnmtools dmr [OPTIONS] <diffs> <hmrs-a> <hmrs-b> <dmr-a-lt-b> <dmr-b-lt-a>

Description

When differential methylation scores and HMRs for two samples/methylomes available, differentially methylated regions (DMRs) can be calculated using the dmr command. This command uses the HMR fragments as candidate intervals, and indicates the number of sites within each HMR fragment show significant differential methylation in the appropriate direction (low vs. high, in both directions). The "fragments" here are obtained by using the symmetric difference of the given HMRs. The output gives this symmetric difference in two separate files, and these may all be called DMRs. However, the user should decide whether the fraction of sites within each interval that show significant differential methylation is sufficient to call any of these an DMR for a particular analysis. So we expect users will filter these intervals (more on this below).

Note 1: the dmr command will work -- meaning it should not crash -- if the two sets of provided intervals (the input-a.hmr and input-b.hmr in the synopsis above) are not actually HMRs. As long as they are in BED format and non-overlapping within each file, the dmr command should run. However, the intended use was with HMRs, and the interpretation may be difficult of some other intervals are used.

Note 2: we refer to "CpG sites" here, but in some settings, for example when studying Arabidopsis, non-CpG sites might be of interest. The dmr command can also work in these settings.

The dmr command requires 5 files to be specified in a fixed order. These include (1) the file of methylation differences as produced by the diff command, (2) the HMRs for the first methylome, (3) the HMRs for the second methylome, (4) the output file for DMRs lower in the first methylome, and (5) the output file for DMRs lower in the second methylome. The output file format is 6-column BED.

In the following example, the letters lt in the two output files (the two BED files as the final arguments) indicate the direction of differential methylation:

$ dnmtools dmr input.diff input-a.hmr input-b.hmr dmr-a-lt-b.bed dmr-b-lt-a.bed

The DMRs are output to files dmr-a-lt-b.bed and dmr-b-lt-a.bed. The former file contains regions with lower methylation in the original counts output file for sample/methylome "a", which might have been named input-a.meth. The latter has regions with lower methylation in the counts output file for methylome "b". One of these files, say dmr-a-lt-b.bed might look as follows:

chr1    3539447 3540231 X:12  0 +
chr1    4384880 4385117 X:6   1 +
chr1    4488269 4488541 X:3   2 +
chr1    4603985 4604344 X:10  2 +
chr1    4760070 4760445 X:8   1 +

The first three columns give genomic coordinates of the "fragments". In this case, these would be intervals covered by input-a.hmr and not bey input-b.hmr. The 4th column contains the number of CpG sites that this DMR spans, preceded by the symbols "X:" since this 4th column is a "name" in the bed format; we avoid simply giving a number for this column. The 5th column contains the number of significantly differentially methylated CpGs in this DMR where the direction of the difference is lower in methylome "a" than in "b". So, the first DMR spans 12 CpG sites, but contains no significantly differentially methylated sites; the second DMR spans 6 CpGs and contains just one significantly differentially methylated CpG site.

We recommend filtering DMRs so that each one contains a sufficient number of CpG sites and meets some threshold for the fraction of CpG sites that have significant differential methylation. This can be easily done with awk, available on virtually all Linux and macOS systems. For example, the following command filters to keep DMRs spanning at least 10 CpGs and having at least 5 significantly differentially methylated CpGs, storing them in a file named dmr-a-lt-b-filtered.bed.

$ awk -F "[:\t]" '$5 >= 10 && $6 >= 5' dmr-a-lt-b.bed > dmr-a-lt-b-filtered.bed

Above, the -F argument indicates possible field separator characters, either a tab or the colon. If, for some reason, the tabs in the file dmr-a-lt-b.bed have been converted to spaces, this would break. If the fraction of significant CpGs is deemed more important than their absolute number, for example at least 50% showing significant differential methylation, the following command can be used:

$ awk -F "[:\t]" '$5 >= 10 && $6/$5 >= 0.5' dmr-a-lt-b.bed > dmr-a-lt-b-filtered.bed

Warning: the first input file, which is output from the diffs command, is directional. The direction determines the order that the two HMR files must be specified. In the synopsis above, the order of "hmrs-a" and "hmrs-b" must match the order that methylomes/samples "a" and "b" were specified when running diffs to obtain the first of the input files. In a typical application, if these are swapped we expect virtually no significant sites in the DMRs (and the 5th column in the outputs would always be 0 or very close). So diffs would have been run as follows:

$ dnmtools diff -o a-before-b.diff input-a.meth input-b.meth

Comparing two small groups of methylomes

To compare two small groups of methylomes, one should combine the methylomes (that is, the output from counts) within each group and then compute DMRs for the resulting pair of methylomes using the approach described above. The counts output files can be combined using the program merge. However, to take advantage of replicates or experimental design, use the radmeth command instead.

Parameters

 -v, -verbose

Print more information while the command is running.

 -c -cutoff

Cutoff on p-values to define significant differences for individual sites (default: 0.05).