UCSC/RefSeq gene annotation

Description

Gene/transcript structures from RefSeq processed by UCSC.

Source

Samples

From H. vulgare (Apr 2021 MorexV3)

Genome Annotation:

Filename Description Feature GEO-ID
1 ucsc_transcriptStart_list.sga Transcript start sites TSS -
2 ucsc_transcriptEnd_list.sga Transcript end sites TES -
3 ucsc_CDSstart_list.sga CDS start sites CDS_start -
4 ucsc_CDSend_list.sga CDS end sites CDS_end -
5 ucsc_intronStart_list.sga Intron start sites intron_start -
6 ucsc_intronEnd_list.sga Intron end sites intron_end -

Notes on samples:

All SGA files have a 6th optional field containing a name of the form <GeneId..RefSeqId>, e.g. "LOC123429323..XM_045113689.1". The SGA files containing the intron start and ends have an additional 7th field containing the serial number of the intron.

Technical Notes

The SGA files were generated from BED format with a custom Perl script available from https://epd.expasy.org/ftp/mga/MorexV3/ucscJan22/scripts/.

References