From S. cerevisiae (Apr 2011 R64/sacCer3).
The following attributes have been selected:
- Ensembl Transcript ID
- Chromosome Name
- Strand
- Transcript Start (bp)
- Transcript End (bp)
- Gene Start (bp)
- Gene End (bp)
- Status (transcript)
- Status (gene)
- Associated Gene Name
Then, transcrips have been filtered according to the following
rules:
- Transcript length > 0 [Transcript Start different from
Transcript End]
- Transcript lies on full chromosomes
This can be archived using the following awk command:
awk -F \\t '
$3 == "1" && $4 != $5 {print "chr" $2
"\tTSS\t" $4 "\t+\t" 1 "\t" $10}
$3 == "-1" && $4 !=
$5 {print $2 "\tTSS\t" $5 "\t-\t" 1 "\t" $10}
'
biomart_output.txt | sort -s -k1,1 -k3,3n -k4,4 |
compact_sga.pl > ENSEMBL.sga
The SGA file can than be transformed into an FPS file using
sga2fps.pl