Cumbie 2015, Nano-CAGE data of Arabidopsis roots.
A new CAGE technique is been developed (NanoCAGE-XL) that promises the identification of high confidence
transcription start sites. This is a proof-of-concept dataset.
- Raw data was downloaded from: PRJNA270670
- Input file format: SRA
From A. thaliana (Feb 2011 TAIR10/araTha1).
Transcription Profiling data:
Following publication guidelines each experiment was treated
FASTQ files were extracted from SRA files using fastq-dump
toolkit v2.5.0). After trimming, reads were mapped to TAIR10 genome using
Bowtie v0.12.8. SAM
files were then converted into bam
using samtools v0.1.14
and to bed using bamToBed v2.12.0
conversion was carried out using bed2sga.pl
(ChIP-Seq v. 1.5.3).
SRX815832 (experiment 1): reads were 101 bp long and contain
a serie of G (3) at the 5'end due to enzimatic reaction
during library preparation. We trim them and noticed that
often there were more Gs that followed. An anlysis of Inr
motif and read distribution around EPDnew promoters
confirmed that the additional G were mostly artefacts, so we
decided to remove them. Reads were further trimed at the
3'end resulting in 50bp long sequences (similar lenght of
the other samples).
SRX1097403 (experiment 2): read lenght 51. The manuscript
describes the presence in this library of 3 different
barcodes and 3 linkers (total lenght of 16bp). We could not
easly find them (even allowing 2 MM) and instead decided to
trim the first 16 bp from each read. This simple procedure
delivered good results in term of motif distribution and
read distribution around EPDnew database.
SRX1097494 (experiment 3): read lenght 51. Reads contained 6
barcodes (total of 9 bp long) at the 5'end that could be
identified (2MM allowed) using an in-house perl
script. After trimming the read mapped locations were
shifted 1bp upstream of the expected position (Inr
motif). For this reason they were trimmed of an additional
Cumbie JS, Ivanchenko MG, Megraw M
NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites.
BMC Genomics. 2015 Aug 13;16:597. doi: 10.1186/s12864-015-1670-6.