entropy - Computing methylation entropy

Synopsis

$ dnmtools entropy [OPTIONS] <genome.fa> <input.epiread>

Description

The concept of methylation entropy was introduced into epigenetics study to characterize the randomness of methylation patterns over several consecutive CpG sites (Xie et al, 2011). The methentropy program processes epireads and calculates the methylation entropy value in sliding windows of specified number of CpGs. Two input files are required.

  • (1) either a genome in FASTA format or a directory containing FASTA chromosome files files

  • (2) an epiread file as produced by states program. The input epiread file needs to be sorted, first by chromosome, then by position. It can be done with the following command.

$ LC_ALL=C sort -k1,1 -k2,2g input.epiread -o input-sorted.epiread

Use the -w option to specify the desired number of CpGs in the sliding window; if unspecified, the default value is 4. In cases where symmetric patterns are considered the same, specify option -F, this will cause the majority state in each epiread to be forced into "methylated", and the minority to "unmethylated". The processed epireads will then be used for entropy calculation. To run the program, type command:

$ dnmtools entropy -w 5 -v -o output.meth /path/to/genome.fa input-sorted.epiread

The output format is the same as counts output. The first 3 columns indicate the genomic location of the center CpG in each sliding window, the 5th column contains the entropy values, and the 6th column shows the number of reads used for each sliding window. Below is an output example.

chr1    483     +       CpG     2.33914 27
chr1    488     +       CpG     2.05298 23
chr1    492     +       CpG     1.4622  24
chr1    496     +       CpG     1.8784  35

Options

 -w, -window

number of CpGs in sliding window (default: 4)

 -F, -flip

flip read majority state to meth

 -o, -output

Name of output file (default: STDOUT)

 -v, -verbose

print more run info to STDERR while the program is running.