The raw data used to generate this track come from the ArrayExpress
accessions
E-MTAB-2270 and
E-MTAB-1423.
Methods
The raw reads, taken from the ArrayExpress accession number
E-MTAB-2270,
were mapped to the mouse genome (mm9 including random chromosomes) with
Bowtie version 0.12.5.
In order to identify enriched regions we used
MACS
version 2.0.9 with the input chromatin sample from ArrayExpress accession number
E-MTAB-1423.
As this tool is very sensitive to the unbalanced number of reads in the
real and the input set, we decided to reduce the larger dataset to
match the number of mapped reads in the smaller dataset by randomly
sampling reads. Instead of using the tool included in the MACS software
for this task, we designed a custom python script (balanceBAMFiles.py) that perform the sampling for pairs of treatment and input samples and determines the appropriate number of reads automatically.
For this process we only considered a maximum of two fully overlapping reads, discarding the rest. To correct for sampling bias we generated 10 different random samples on which we ran MACS specifying the shift size to 90, q value to 1e-2 and leaving the rest of parameters as default. We subsequently collapsed the 10 different peak calling results for each set using another custom script (aggregatePeaksFromSubsampling.py) which reports only overlapping peaks in at least 9 of the 10 lists.
The resulting peak bed files were first filtered to discard peaks with q-value lower than 1e-5 and then converted into bigBed files using the tool bedToBigBed and the q-values bedGraph files were converted into bigWig file with the bedGraphToBigWig tool from the UCSC Genome Browser.
Credits
Data were generated and processed for the CISSTEM project. For inquiries, please contact Juan L. Mateo at the following address: mateojuan (at) uniovi.es