ENSEMBL, TSS collection downloaded from ENSEMBL.

The following attributes have been selected:

Ensembl Transcript ID
Chromosome Name
Strand
Transcript Start (bp)
Transcript End (bp)
Gene Start (bp)
Gene End (bp)
Status (transcript)
Status (gene)
Associated Gene Name

Then, transcrips have been filtered according to the following rules:

Transcript length > 0 [Transcript Start different from Transcript End]
Transcript lies on full chromosomes
Gene must have a 5' UTR [Transcript Start different from Gene Start]
Genes must be annotated [Associated Gene Name present]
Gene and transcripts status known

This can be archived using the following awk command: awk -F \\t '
$2 ~ "^[0-9][0-9]?|^[XY]" && $3 == "1" && $4 != $5 && $4 != $6 && $10 != "" && $8 == "KNOW" && $9 == "KNOW" {print "chr" $2 "\tTSS\t" $4 "\t+\t" 1 "\t" $10}
$2 ~ "^[0-9][0-9]?|^[XY]" && $3 == "-1" && $4 != $5 && $5 != $7 && $10 != "" && $8 == "KNOW" && $9 == "KNOW" {print $2 "\tTSS\t" $5 "\t-\t" 1 "\t" $10}
' biomart_output.txt | sort -s -k1,1 -k3,3n -k4,4 | compact_sga.pl > ENSEMBL.sga

Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A
BioMart Central Portal--unified access to biological data. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W23-7. doi: 10.1093/nar/gkp265. Epub 2009 May 6. 19420058

	Filename	Description	Feature	GEO-ID
1	Dm_ENSEMBL86.sga	TSS from ENSEMBL86	TSS	-

ENSEMBL, TSS collection downloaded from ENSEMBL.

Description

Source

Samples

Technical Notes

References