fastlift - Mapping methylomes between species
Synopsis
$ dnmtools fastlift -i <input.index> -f <input.from> -t <output.to>
Description
Mapping methylomes between species builds on the
liftOver tool provided by
UCSC Genome Browser. However it is time
consuming to convert each methcounts output file from one assembly
to another using the UCSC liftOver tool, given that they all should
have the same locations but different read counts. Therefore, we use
liftOver to generate an index file between two assemblies, and provide
the fast-liftover tool. Suppose we have downloaded the liftOver tool
and the chain file mm9ToHg19.over.chain.gz from the UCSC Genome
Browser website. If we have a methcounts file mm9.meth of
CpG sites or all cytosines in mm9. Entries in mm9.meth
look like
chr1 3005765 + CpG 0.166667 6
chr1 3005846 + CpG 0.5 10
chr1 3005927 + CpG 0 9
We would like to lift it over to the human genome hg19, and generate
an index file mm9-hg19.index to facilitate later lift-over
operations from mm9 to hg19, and keep a record of unlifted mm9
cytosine positions in the file mm9-hg19.unlifted. First, convert the
counts file mm9.meth to the
BED file mm9-cpg.bed file for liftOver using the following command.
$ awk '{print $1"\t"$2"\t"$2+1"\t",$1":"$2":"$2+1":+\t0\t+"}' mm9.meth >mm9-cpg.bed
The output file mm9-cpg.bed should look like this:
chr1 3005765 3005766 chr1:3005765:3005766:+ 0 +
chr1 3005846 3005847 chr1:3005846:3005847:+ 0 +
chr1 3005927 3005928 chr1:3005927:3005928:+ 0 +
Note that the fourth column is the genomic location data linked with colons.
Then, run UCSC Genome Browser tool liftOver as follows:
$ liftOver mm9-cpg.bed mm9ToHg19.over.chain.gz mm9-hg19.index mm9-hg19.unlifted
The generated index file mm9-hg19.index will be a BED format file in
hg19 coordinates, with entries like
chr8 56539820 56539821 chr1:3005765:3005766:+ 0 -
chr8 56539547 56539548 chr1:3005846:3005847:+ 0 -
chr8 56539209 56539210 chr1:3005927:3005928:+ 0 -
where the 4th column contains the genomic position of the cytosine site in mm9 coordinates.
Next, convert the file mm9-hg19.index to a tab-separated input to be
passed onto the fast-liftover tool as follows.
$ tr ':' '\t' <mm9-hg19.index | awk '{print $4"\t"$5"\t"$1"\t"$2"\t"$9}' >mm9-hg19-fastliftover.index
After the index file is converted, we can use the fast-liftover
program on any mm9 methcounts file to lift it to hg19:
$ dnmtools fastlift -i mm9-hg19-fastliftover.index -f mm9.meth -t hg19-lift.meth
The -p option should be specified to report positions on the
positive strand of the target assembly. Before using the lifted
methcounts file, make sure it is sorted properly.
$ LC_ALL=C sort -k1,1 -k2,2g -k3,3 hg19-lift.meth -o hg19-lift-sorted.meth
Options
-i, -indexfile
index file [required]
-f, -from
Original file [required]
-t, -to
Output file liftovered [required]
-u, -unlifted
(optional) File for unlifted sites
-p, -plus-strand
(optional) Report sites on + strand
-v, -verbose
print more run info to STDERR as the program runs