levels - global methylation summary statistics
Synopsis
$ dnmtools levels [OPTIONS] <input.meth>
Description
The levels command computes global summary statistics for the output
of counts. Example output is below. It computes multiple
summary statistics related to the quantity of data (e.g., coverage of
sites) and methylation (e.g., global average methylation). These
summary statistics are also provided by context. The context are
explained here. These are not exclusive
categories, and include:
- cytosines, all of them, on either strand
- cpg sites, on either strand
- symmetric cpg sites (strands combined)
- the CHH context
- the CCG context
- the CXG context (we "invented" this one)
The summary statistics computed include:
total_sitesthe total number of sites counted for this contextsites_coveredamong the total above, those with at least one readtotal_camong the observations in reads, how many are Ctotal_tamong the observations in reads, how many are Tmax_depththe most coverage of any site for this contextmutationsnumber of sites for this context marked as mutatedcalled_methnumber of sites "called" methylatedcalled_unmethnumber of sites "called" unmethylatedmean_aggthe sum of methylation levels for all sitescoveragetotal data informing on sites for this contextsites_covered_fractionfraction of sites coveredmean_depthamong all sites, the mean coverage by readsmean_depth_coveredamong all covered sites, the mean coveragemean_meththe mean of the methylation levels for covered sitesmean_meth_weightedthe mean weighted by coveragefractional_meththe fraction of "called" sites "called" methylated
(If you want more information on these, please ask.)
Among the above, many are included because they are needed for
calculating the the "derived" statistics. For example, the mean_agg
is used in the denominator for mean_meth, where the denominator is
the number of covered sites. Why keep those raw statistics? Because
it's essential if two different levels output files are combined.
The final three values are the "levels" and are described in Schultz et al. (2012):
"Leveling" the playing field for analyses of single-base resolution DNA methylomes
Schultz, Schmitz & Ecker (TIG 2012)
Note: the fractional_meth level we calculate is inspired but
different from the paper. What we are do is use a binomial test to
determine significantly hyper/hypomethylated sites, and only use the
subset of significant sites to calculate fractional_meth level.
This command should provide flexibility to compare methylation data with publications that calculate averages different ways. The sample output below only shows the results for cytosines and CpGs in the sample, but similar output is generated for symmetric CpGs and cytosines in the CHH, CCG, and CXG contexts.
cytosines:
total_sites: 1200559022
sites_covered: 797100353
total_c: 417377038
total_t: 4048558428
max_depth: 30662
mutations: 3505469
called_meth: 44229556
called_unmeth: 750163257
mean_agg: 4.40429e+07
coverage: 4465935466
sites_covered_fraction: 0.663941
mean_depth: 3.71988
mean_depth_covered: 5.60273
mean_meth: 0.055254
mean_meth_weighted: 0.093458
fractional_meth: 0.055677
cpg:
total_sites: 58803590
sites_covered: 47880982
total_c: 261807401
total_t: 84403225
max_depth: 30080
mutations: 381675
called_meth: 38861909
called_unmeth: 7152004
mean_agg: 3.69282e+07
coverage: 346210626
sites_covered_fraction: 0.814253
mean_depth: 5.88758
mean_depth_covered: 7.23065
mean_meth: 0.771250
mean_meth_weighted: 0.756208
You can run the levels command as follows:
$ dnmtools levels -o output.levels input.meth
Options
-o, -output
Output file in YAML format (default: stdout).
-a, -alpha
Alpha for confidence interval (default: 0.95).
-v, -verbose
Report more information while the program is running.