#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # $RCSfile: epd.it,v $ # $Revision: 1.2 $ # $Date: 2000/04/15 $ # $Author: Claude Bonnard $ # #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $Resource:[ creDate: |19-Apr-2000 description: |The Eukaryotic Promoter Database EPD was designed and developed at |the Weizmann Institute of Science in Rehovot (Israel) and is currently |maintained at the Swiss Institute of Bioinformatics (SIB) and ISREC in |Epalinges s/Lausanne (Switzerland). EPD is a |specialized annotation database; it provides |information about eukaryotic promoters available in the EMBL Data |Library and is intended to assist experimental researchers, as well as |computer analysts, in the investigation of eukaryotic transcription |signals. The present version originated from a previous compilation |published in an article (1) and is organized as a hierarchically |ordered and documented "functional position set" (2) pointing to |transcription initiation sites. All information is directly abstracted |from scientific literature and is thus independent of the EMBL sequence |entry descriptions. As a consequence, many of the initiation sites |referred to in EPD do not appear in corresponding EMBL feature tables. |
A co-ordinated updating procedure has been set up by the two |laboratories that will ensure future compatibility between the position |references in EPD and the sequence data in the main data library. |Investigators who access EMBL via publicly available programs should |be aware of the fact that software producers occasionally modify the |sequence data in ways that render position references inaccurate. EPD is |generally not compatible with sequence data of another release because |EMBL sequence entries are not designed as stable data units.
The |completeness and accuracy of EPD greatly benefits from user-feedback. |Any report of mistakes or omissions would be very much appreciated. |Direct communication of newly published transcript mapping or gene |expression data is also welcome. Please forward all correspondence |to the address given below. Use electronic mail if possible. |
|Philipp Bucher and Rouayda Cavin Perier
|Swiss Institute of Bioinformatics and
|Swiss Institute for Experimental Cancer Research
|Ch. des Boveresses 155
|CH-1066 Epalinges s/Lausanne
|Switzerland
|Electronic mail:
| Rouaida Perier \
| Rouaida.Perier@isrec.unil.ch
|
|(1) Bucher,P. & Trifonov, E.N., "Compilation and analysis of
| eukaryotic POL II promoter sequences".
| Nucl. Acids Res. 14, 10009-10026 (1986).
|
|(2) Bucher,P. & Bryan,B., "Signal search analysis: a new method to
| localize and characterize functionally important DNA sequences".
| Nucl. Acids Res. 12, 287-305 (1984).
contact:
|Philipp Bucher and Rouayda Cavin Perier
|Swiss Institute of Bioinformatics and
|Swiss Institute for Experimental Cancer Research
|Ch. des Boveresses 155
|CH-1066 Epalinges s/Lausanne
|Switzerland
|Electronic mail:
| Rouaida Perier \
| Rouaida.Perier@isrec.unil.ch
www:
| \
|The User Manual at https://epd.expasy.org/epd/current/usrman.php
| \
|The download of a subset at https://epd.expasy.org/epd/seq_download.html
|
citation:
|Rouayda Cavin Perier, Viviane Praz, Thomas Junier, Claude Bonnard and Philipp Bucher
|(2000). The Eukaryotic Promoter Database (EPD).
| Nucleic Acids Res 28(1):302-303.
ftp:
|\
|ftp://ftp.epd.unil.ch/pub/databases/epd
fields:{
ID:
|ENTRY_NAME is a unique entry identifier "HS_MYC_2" which obeys rigorous
|naming conventions. It contains 2 or 3 fields, the first is the species
|identification code at most 4 alphanumeric characters representing the
|biological source of the promoter.
# |The entry code is a five-digit number which is the only part of a
# |promoter entry that is stable from release to release. The first two
# |digits designate the release of initial appearance.
AccNumber:
|The accession number consists of the character string "EP" followed by
|5 digits representing the EMBL release number followed by the EPD entry
|order. Most EPD entries currently have only one accession number. If
|necessary, more then one AC will be used, separated by semicolons and
|the list is terminated by a semicolon.
Description:
|The description lines contain general descriptive information about the
|promoter. The description is given in ordinary English and is free-format |. In some cases, more than one DE line is required; in this case, the
|text is divided only between words and only the last DE line is
|terminated by a period.
|
Organism:
|The species line specifies the source organism(s) of the promotery.
|The species names are based on NCBI's taxonomy and thus can be
|automatically hyperlinked to the NCBI's taxonomy web pages.
# |Species codes consist of the
# |initials of genus and species name. Occasionally, three characters
# |are required to generate unique codes. Standard abbreviations identify
# |viruses. The full names of the organisms can be used in a query. The
# |names are given in appendix B.1. Subspecies or strains are specified
# |in parentheses.
# |Chromosomal locations (genetic or cytogenetic loci, genomic map
# |units, etc.) appear in square brackets immediately following species
# |codes.
Link:
|The DR lines contain cross-references to other EPD entries (if there
|are alternative promoters of the same gene), or to entries from other
|databases. So far, we have incorporated links to EMBL, TRANSFAC (3),
|SWISS-PROT (4), Flybase (5), MIM (6) and MGD. The precise format of
|these lines depends on the target database. Note that some
|cross-references include numbers enclosed in square brackets indicating
| the relative position of a linked sequence object, or keywords
|characterising the nature of the relationship between the entries.
|For instance, the ranges associated with cross-references to EMBL
|entries define the extensions of the EMBL sequences relative to the
|initiation site described by the EPD entry. The multiplicity of EMBL
|cross-references in some entries mirrors the redundancy of the sequence
|database.
doc:
|Documentation of promoter entries is presented on lines starting with
|"DO". They are essentially free format and so far not processed by
|specific programs. In the present release, there are two DO lines per
|entry, the first referring to the transcript mapping experiments that
|define the promoter, the second giving information about expression and
|regulation.The varies experimental techniques are identified by number
|codes, the "Medline's number" and/or "example" in brackets are linked,
|respectively, to the abstract and/ or to the full text article
|describing the related experiment.
method:
|The method lines describe experiments defining the transcription
|initiation site. In the new format, the experiments are individually
|linked to bibliographic references.
altern_prom:
|The AP line is optional and provides information on alternative
|promoters of the same gene.
taxo:
|The TX lines define a promoter's location within EPD's hierarchical
|classification system
GeneProduct:
|Many gene products are listed in appendix B.3. Alternative initiation
|sites are identified by right-justified P1,P2.., or E1,E2.., depending
|on whether the corresponding 5'exons are 3'co-terminal or not. The
|strongest initiation site is marked by trailing + if known.
ExpProtocol:
|Special characters appended to the number codes designate an
|experimental geneexpression system where the RNA for the corresponding
|experiments wassynthesized. A query may have next key words:
|
codes | experiments | |
---|---|
1 | direct RNA sequencing | |
2 | length measurement of an RNA product | |
3 | length measurement of a nuclease-protected complementary |RNA or DNA fragment by comparison with homologous sequence ladder | |
4 | same as 3 but with heterologous size markers | |
5 | RNA sequencing by dideoxy-terminated primer extension | |
6 | DNA Sequencing of an in vitro generated strong-stop cDNA or |a full-length cDNA clone | |
7 | length measurement of a primer-extension product by |comparison with homologous sequence ladder | |
8 | same as 7 but with heterologous size markers | |
9 | DNA sequencing of a full-length processed pseudogene | |
10 | length measurement of a reverse direction primer-extension |product (blocked by RNA 5'end) by comparison with homologous sequence |ladder | |