Contents of the EPD directory: Subdirectory XXX (indicating EPD release number): - epdXXX.dat EPD database release XXX - epdXXX.idx index of epdXXX.dat based on AC and ID - epdXXX.usr EPD user manual for release XXX - epdXXX_bulk.dat bulk EPD database release XXX - epdXXX_bulk.idx index of epdXXX_bulk.dat based on AC and ID Subdirectory current: always the latest release - epd.dat EPD database - epd.idx index of epd.dat based on AC and ID - epd.seq EPD promoter sequences in FASTA format - epd.usr EPD user manual - epd_bulk.dat bulk section of EPD database - epd_bulk.idx index of epd_bulk.dat based on AC and ID - epd_bulk.seq sequence data in FASTA format of bulk section of EPD - epd_16K.seq EPD release XXX sequence data in FASTA format - epd_16K_seq.idx epdXXX_16K.seq index based on AC - epd_bulk_16K.seq sequence data in FASTA format of bulk section of EPD release XXX - epd_bulk_16K_seq.idx epdXXX_bulk_16K.seq index based on AC Notes: - The original EPD format defines promoter sequences indirectly by pointers to EMBL sequence data. These pointers are only correct for EPD and EMBL versions of the same release. - The files epd.seq and epd_bulk.seq contain sequence data for EPD promoter entries in the range -499 to +100 relative to the transcription initiation sites. Instead, the files epd_16K.seq and epd_bulk_16K.seq contain sequences in the interval -9999 to 6000 relative to transcription initiation site. Subdirectories SRS6 (no longer maintained!): - epd.i it describes the logical stucture of the database - epd.is it describes the syntax of the database entries - epd.it it is the description of the database served by the SRS www server Subdirectories views: - epd_for_integr8.xml current EPD release XXX in XML format provided for integr8 project at EBI - epdXXX.xml EPD release XXX in XML format - epdXXX_xml.idx epdXXX.xml index based on AC - epd.dtd EPD DTD file for XML format - epd.xsl associated stylesheet for XML format - epdXXX.seq EPD release XXX sequence data in FASTA format - epdXXX_seq.idx epdXXX.seq index based on AC - epdXXX_bulk.seq sequence data in FASTA format of bulk section of EPD release XXX - epdXXX_bulk_seq.idx epdXXX_bulk.seq index based on AC - epdXXX_16K.seq EPD release XXX sequence data in FASTA format - epdXXX_16K_seq.idx epdXXX_16K.seq index based on AC - epdXXX_bulk_16K.seq sequence data in FASTA format of bulk section of EPD release XXX - epdXXX_bulk_16K_seq.idx epdXXX_bulk_16K.seq index based on AC - epd-emblXXX.dr List of all EMBL accession numbers and their corresponding EPD ACs and IDs - README this file Notes: - All sequence data files in this directory have been automatically compiled from the EMBL release XXX division files using the sequence position pointers in EPD. - The files epdXXX.seq and epdXXX_bulk.seq contain sequence data for EPD promoter entries in the range -499 to +100 relative to the transcription initiation sites. Instead the files epdXXX_16K.seq and epdXXX_bulk_16K.seq contain sequences in the interval -9999 to 6000 relative to transcription initiation site. - The sequence headers in epdXXX.seq and epdXXX_bulk.seq are of the following type: >EP17001 (+) Pv snRNA U1; range -499 to 100. The sequence identifier consists of the acronym EP followed by the corresponding EPD entry code. The plus sign in parentheses reflects the "independent subset status" as described in the EPD user manual. For statistical analysis, it is recommended to use only those sequences containing the string "(+)" in the header line. 4 May 2011 EPD developers