#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # $RCSfile: epd.it,v $ # $Revision: 1.2 $ # $Date: 2000/04/15 $ # $Author: Claude Bonnard $ # #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $Resource:[ creDate: |19-Apr-2000 description: |The Eukaryotic Promoter Database EPD was designed and developed at |the Weizmann Institute of Science in Rehovot (Israel) and is currently |maintained at the Swiss Institute of Bioinformatics (SIB) and ISREC in |Epalinges s/Lausanne (Switzerland). EPD is a |specialized annotation database; it provides |information about eukaryotic promoters available in the EMBL Data |Library and is intended to assist experimental researchers, as well as |computer analysts, in the investigation of eukaryotic transcription |signals. The present version originated from a previous compilation |published in an article (1) and is organized as a hierarchically |ordered and documented "functional position set" (2) pointing to |transcription initiation sites. All information is directly abstracted |from scientific literature and is thus independent of the EMBL sequence |entry descriptions. As a consequence, many of the initiation sites |referred to in EPD do not appear in corresponding EMBL feature tables. |

A co-ordinated updating procedure has been set up by the two |laboratories that will ensure future compatibility between the position |references in EPD and the sequence data in the main data library. |Investigators who access EMBL via publicly available programs should |be aware of the fact that software producers occasionally modify the |sequence data in ways that render position references inaccurate. EPD is |generally not compatible with sequence data of another release because |EMBL sequence entries are not designed as stable data units.

The |completeness and accuracy of EPD greatly benefits from user-feedback. |Any report of mistakes or omissions would be very much appreciated. |Direct communication of newly published transcript mapping or gene |expression data is also welcome. Please forward all correspondence |to the address given below. Use electronic mail if possible. |

|Philipp Bucher and Rouayda Cavin Perier
|Swiss Institute of Bioinformatics and
|Swiss Institute for Experimental Cancer Research
|Ch. des Boveresses 155
|CH-1066 Epalinges s/Lausanne
|Switzerland

|Electronic mail: | Rouaida Perier \ | Rouaida.Perier@isrec.unil.ch
|

|(1) Bucher,P. & Trifonov, E.N., "Compilation and analysis of | eukaryotic POL II promoter sequences". | Nucl. Acids Res. 14, 10009-10026 (1986). |
|(2) Bucher,P. & Bryan,B., "Signal search analysis: a new method to | localize and characterize functionally important DNA sequences". | Nucl. Acids Res. 12, 287-305 (1984). contact: |Philipp Bucher and Rouayda Cavin Perier
|Swiss Institute of Bioinformatics and
|Swiss Institute for Experimental Cancer Research
|Ch. des Boveresses 155
|CH-1066 Epalinges s/Lausanne
|Switzerland

|Electronic mail: | Rouaida Perier \ | Rouaida.Perier@isrec.unil.ch
www: | \ |The User Manual at https://epd.expasy.org/epd/current/usrman.php
| \ |The download of a subset at https://epd.expasy.org/epd/seq_download.html
| citation: |Rouayda Cavin Perier, Viviane Praz, Thomas Junier, Claude Bonnard and Philipp Bucher |(2000). The Eukaryotic Promoter Database (EPD). | Nucleic Acids Res 28(1):302-303.
ftp: |\ |ftp://ftp.epd.unil.ch/pub/databases/epd
fields:{ ID: |ENTRY_NAME is a unique entry identifier "HS_MYC_2" which obeys rigorous |naming conventions. It contains 2 or 3 fields, the first is the species |identification code at most 4 alphanumeric characters representing the |biological source of the promoter. # |The entry code is a five-digit number which is the only part of a # |promoter entry that is stable from release to release. The first two # |digits designate the release of initial appearance. AccNumber: |The accession number consists of the character string "EP" followed by |5 digits representing the EMBL release number followed by the EPD entry |order. Most EPD entries currently have only one accession number. If |necessary, more then one AC will be used, separated by semicolons and |the list is terminated by a semicolon. Description: |The description lines contain general descriptive information about the |promoter. The description is given in ordinary English and is free-format |. In some cases, more than one DE line is required; in this case, the |text is divided only between words and only the last DE line is |terminated by a period. | Organism: |The species line specifies the source organism(s) of the promotery. |The species names are based on NCBI's taxonomy and thus can be |automatically hyperlinked to the NCBI's taxonomy web pages. # |Species codes consist of the # |initials of genus and species name. Occasionally, three characters # |are required to generate unique codes. Standard abbreviations identify # |viruses. The full names of the organisms can be used in a query. The # |names are given in appendix B.1. Subspecies or strains are specified # |in parentheses. # |Chromosomal locations (genetic or cytogenetic loci, genomic map # |units, etc.) appear in square brackets immediately following species # |codes. Link: |The DR lines contain cross-references to other EPD entries (if there |are alternative promoters of the same gene), or to entries from other |databases. So far, we have incorporated links to EMBL, TRANSFAC (3), |SWISS-PROT (4), Flybase (5), MIM (6) and MGD. The precise format of |these lines depends on the target database. Note that some |cross-references include numbers enclosed in square brackets indicating | the relative position of a linked sequence object, or keywords |characterising the nature of the relationship between the entries. |For instance, the ranges associated with cross-references to EMBL |entries define the extensions of the EMBL sequences relative to the |initiation site described by the EPD entry. The multiplicity of EMBL |cross-references in some entries mirrors the redundancy of the sequence |database. doc: |Documentation of promoter entries is presented on lines starting with |"DO". They are essentially free format and so far not processed by |specific programs. In the present release, there are two DO lines per |entry, the first referring to the transcript mapping experiments that |define the promoter, the second giving information about expression and |regulation.The varies experimental techniques are identified by number |codes, the "Medline's number" and/or "example" in brackets are linked, |respectively, to the abstract and/ or to the full text article |describing the related experiment. method: |The method lines describe experiments defining the transcription |initiation site. In the new format, the experiments are individually |linked to bibliographic references. altern_prom: |The AP line is optional and provides information on alternative |promoters of the same gene. taxo: |The TX lines define a promoter's location within EPD's hierarchical |classification system GeneProduct: |Many gene products are listed in appendix B.3. Alternative initiation |sites are identified by right-justified P1,P2.., or E1,E2.., depending |on whether the corresponding 5'exons are 3'co-terminal or not. The |strongest initiation site is marked by trailing + if known. ExpProtocol: |Special characters appended to the number codes designate an |experimental geneexpression system where the RNA for the corresponding |experiments wassynthesized. A query may have next key words: |

|or the transcript mapping experiments number that define the promoter, |that gives information about expression and regulation. The varies |experimental techniques are identified by number codes: | |
codesexperiments |
1direct RNA sequencing |
2length measurement of an RNA product |
3length measurement of a nuclease-protected complementary |RNA or DNA fragment by comparison with homologous sequence ladder |
4same as 3 but with heterologous size markers |
5RNA sequencing by dideoxy-terminated primer extension |
6DNA Sequencing of an in vitro generated strong-stop cDNA or |a full-length cDNA clone |
7length measurement of a primer-extension product by |comparison with homologous sequence ladder |
8same as 7 but with heterologous size markers |
9DNA sequencing of a full-length processed pseudogene |
10length measurement of a reverse direction primer-extension |product (blocked by RNA 5'end) by comparison with homologous sequence |ladder |
ExpresRegul: |The information on expression/regulation may include indication of |developmentalstages, tissues, cell types, cell cycle stages, and |various regulatory features. |
|
;
delimits different types of specifications | (e.g. developmental stage and tissue). | |
,
delimits alternative keywords (e.g. liver, kidney) | |
+
means "induced by" or "strongly expressed in". | |
-
means "repressed by" or "weakly expressed in". | |
~
means "modulated by". | |
[...]
delimits cell cycle stages. |
} date: |19 Apr 2000 signature: |Claude Bonnard ]