selectsites - get subsets of cytosines from counts files

Synopsis

$ dnmtools selectsites [OPTIONS] <regions.bed> <input.counts>

Description

In many cases, we may be interested in analyzing only a subset of cytosines or CpGs in a sample. Some instances of these cases including calculating average methylation levels in (1) annotated regions, such as promoter regions or repeats or (2) regions defined by the data itself, such as HMRs or PMDs.

A possible solution to subset these regions is to convert the counts file to BED format, intersect it with a BED file of the regions of interest (using bedtools), then convert it back to counts. The program selectsites simplifies these operations. It takes a counts format file and a set of intervals in a BED file and produces a subset of the entries in the counts file included in the BED regions. We can select entries in input.counts contained in any inverval in regions.bed using the following command.

$ dnmtools selectsites -o output.counts regions.bed input.counts

Options

 -o, -output

Name of output file (default: STDOUT)

 -p, -preload

Preload sites (use for large target intervals).

 -v, -verbose

Print more run info to STDERR while the program is running.

 -d, -disk

Process sites on disk (fast if target intervals are few).

 -S, -summary

Write summary to this file.

 -z, -zip

The output file will be in gzip compressed format.

 -relaxed

Allow additional columns in the input file.