EUKARYOTIC PROMOTER DATABASE USER MANUAL
Written by: Philipp Bucher, Rouaida Cavin Périer, Viviane Praz and Christoph Schmid

EPFL School of Life Sciences - SV
and Swiss Institute for Experimental Cancer Research - ISREC
Computational Cancer Genomics Group
EPFL SV ISREC GR-BUCHER Station 15
CH-1015 /Lausanne
Switzerland

Electronic mail:

This manual and the database it accompanies may be copied and redistributed freely, without advance permission, provided that this statement is reproduced with each copy.

Published Research assisted by the Eukaryotic Promoter Database should cite:
EPD in its twentieth year: towards complete promoter coverage of selected model organisms
Schmid, C.D., Perier, R., Praz, V. and Bucher, P. (2006) Nucleic Acids Res, 34, D82-85.


1 INTRODUCTION

The Eukaryotic Promoter Database EPD was designed and developed at the Weizmann Institute of Science in Rehovot (Israel) and is currently maintained at ISREC in Epalinges s/Lausanne (Switzerland). EPD is a specialized annotation database of the EMBL Data Library. It provides information about eukaryotic promoters available in the EMBL Data Library and is intended to assist experimental researchers, as well as computer analysts, in the investigation of eukaryotic transcription signals. The present version originated from a previous compilation published in an article (1) and is organized as a hierarchically ordered and documented "functional position set" (2) pointing to transcription initiation sites. All information is either directly extracted from scientific literature or, starting from release 73, compiled by a new in silico primer extension method (16). Thus promoter information in EPD is independent of the EMBL sequence entry descriptions. As a consequence, many of the initiation sites referred to in EPD do not appear in corresponding EMBL feature tables.A coordinated updating procedure has been set up by the two laboratories that will ensure future compatibility between the position references in EPD and the sequence data in the main data library. Investigators who access EMBL via publicly available programs should be aware of the fact that software producers occasionally modify the sequence data in ways that render position references inaccurate. EPD is generally not compatible with sequence data of another release because EMBL sequence entries are not designed as stable data units. The completeness and accuracy of EPD greatly benefits from user-feedback. Any report of mistakes or omissions would be very much appreciated. Direct communication of newly published transcript mapping or gene expression data is also welcome. Please forward all correspondence to the address given on top of this document. Use electronic mail if possible.

2 PROMOTER SELECTION

EPD is a rigorously selected database. In order to be included in EPD, a promoter must be:
  1. recognized by eukaryotic RNA POL II,
  2. active in a higher eukaryote,
  3. experimentally defined, or homologous and sufficiently similar to an experimentally defined promoter,
  4. biologically functional,
  5. available in the current EMBL release,
  6. distinct from other promoters in the database.
Explanations:
  1. Transcription by RNA POL II is bona fide assumed for protein coding genes but must be supported by alpha-amanitin data if the end product is an RNA.
  2. All eukaryotes except phycophyta, fungi, myxomycetes, and protozoa are considered higher eukaryotes. Note that the expression "active in" does not always refer to the source organism of the promoter (e.g. in viruses). EPD contains currently promoter sequences from 139 different species.
  3. A promoter is experimentally determined if a corresponding transcription initiation site is mapped with a precision of +/- 5 bp or higher. Any technique that characterizes the 5'terminus of an in vivo or in vitro generated RNA is acceptable. Single nuclease-protection or primer-extension data must be accompanied by additional evidence unless the gene's intron-exon organization is well established. Similarity is considered "sufficient" if percent identity (as defined in Section 6) is >=60% between -79 and +20 or >=75% between -49 and +10.
  4. A promoter is biologically functional if it contributes to the source organism's survival and/or reproduction. This is bona fide assumed except for promoters of pseudogenes, minor transcription initiation sites (<20% of total gene transcripts), promoters giving rise to an unstable RNA product, and mutant promoter.
  5. The minimum sequence requirement is 45 bp between -49 and +10.
  6. Promoters are considered distinct if they originate from different gene loci or different species. Identity is assumed if two promoters from the same species exhibit >95% similarity between -79 and +20 while their genetic relationship is unknown. Multiple isolates of viruses or transposable elements are considered distinct if at least one promoter region fails to fulfill the above similarity criterion.

3 ASSIGNMENT OF TRANSCRIPTION INITIATION SITE

A eukaryotic promoter is defined as a DNA sequence around a transcription initiation site. The position reference to the initiation site is therefore the central part of a promoter entry. Its assignment is based directly on experimental data shown in an article, proposed adjustments originating from consensus sequence considerations being ignored. In the case of minor discrepancies between different publications averaged positions are given. Position references are subject to permanent re-evaluation. A transcription initiation site may be reassigned upon publication of new data. Position references are replaced if longer upstream sequences of the same promoter become available in a new EMBL sequence entry.
Several initiation sites preceding the same gene appear as alternative promoters if they are clearly separated from each other or differentially regulated. The minimum distance required between two alternative initiation sites is 20 bp. Otherwise, they are considered a single promoter region.
Four types of promoters are distinguished by one-letter codes in order to account for the variety of transcription initiation patterns in eukaryotes:
  • S: Single initiation site: >90% of all reported transcripts initiate within 10 bp (the experimental data usually do not allow distinction between a single cap-site and small mRNA 5' heterogeneity).
  • M: Multiple initiation sites: >75% of all reported transcripts initiate within 20 bp.
  • R: Initiation region: >75% of all reported transcripts initiate within 100 bp.
  • U: Undefined transcription initiation pattern, exclusively in 'preliminary' entries in epd_bulk.dat (see next section).
Note that in addition to true alternative promoter activity, variability in the position of the transcription initiation site might also be due to experimental constraints, a biological variability in the activity of the DNA polymerase II, or the presence of highly similar (pseudo-) genes with distinct transcription initiation sites.
In sequence entries that contain a complete RNA or DNA genome of a retrovirus or a retrovirus-like transposable elements, the position reference points to the U3/R boundary of the 3'terminal LTR.

4 FORMAT CONVENTIONS

EPD is distributed as two ASCII flatfiles (epd.dat, epd_bulk.dat) in essentially identical format. Differences in the format of 'preliminary' entries in 'epd_bulk.dat' are described in paragraph 4.4. EPD files contain a title line followed by a number of promoter entries. Interspersed are group headings whose function and format are described in the next section. The title line and parts of the promoter entries are rigidly formatted so that the entire database conforms to the standards of an FPS file (functional position set) of our current signal search analysis (1,2) software.

4.1. The title line

The title line of EPD is shown below:
TI   EPD83     Eukaryotic Promoter Database / Release 83              EP
The TI line contains the following fields:

 

columns data type
1- 2 "TI"
3- 5 (blank)
6-15 FPS name
16-70 title
71-72 FPS code
Explanations:
  • FPS name and FPS code are used by our data extraction software to generate default names for output files.


4.2. Promoter entries

An EPD entry contains the following types of information:
  • Promoter identification and description.
  • Machine-readable pointers to the transcription initiation site in corresponding sequence entries.
  • Description of the experimental evidence defining the transcription start site.
  • Various kinds of promoter classifications useful for extraction of biologically meaningful promoter subsets.
  • Information on regulatory properties.
  • Cross-references to other databases.
  • Bibliographic references.
Promoter entries are presented in a similar format as EMBL and SWISS-PROT sequence entries. Each line starts with a line code identifying the type of information presented. The current line types and line codes and the order in which they appear in an entry, are shown below:
    ID  - IDentification.
    AC  - ACcession number(s).
    DT  - DaTe.
    DE  - DEscription.
    OS  - Organism Species.
    HG  - Homology Group.
    AP  - Alternative Promoter.
    NP  - Neighbouring Promoter.
    DR  - Database cross-References.
    RN  - Reference Number.
    RX  - Reference cross-references.
    RA  - Reference Authors.
    RT  - Reference Title.
    RL  - Reference Location.
    ME  - MEthods.
    SE  - SEquence.
    FL  - Full Length.
    IF  - Initiation Frequency.
    TX  - TaXonomy.
    KW  - KeyWords.
    FP  - Functional Position.
    DO  - DOcumentation.
    RF  - literature ReFerence.
    //  - Termination line.
Spacer lines (XX) are inserted in order to make the promoter database easier to read by eye. Some line types occur many times in a single entry. Each entry must begin with an identification line (ID) and end with a terminator line (//). Text does not exceed column 72. Below is an example of a promoter entry:
      ID   HS_MYC_2     standard; single; VRT.
      XX
      AC   EP11148;
      XX
      DT   ??-APR-1987 (Rel. 11, created)
      DT 07-MAR-2005 (Rel. 82, Last annotation update).
      XX
      DE   c-myc (cellular homologue of myelocytomatosis virus 29 oncogene),
      DE   promoter 2.
      OS   Homo sapiens (human)
      XX
      HG   Homology group 53; Mammalian c-myc proto-oncogene, promoter 2
      AP   Alternative promoter #2 of 2; exon 1; site 2; major promoter.
      NP   none.
      XX
      DR GENOME; NT_008046.15; NT_008046; [-41966656, 15188617].
      DR EPD; EP11146; HS_MYC_1; alternative promoter; [-162; +].
      DR CLEANEX; HS_MYC.
      DR EMBL; AC103819.3; [-87815, 60206].
      DR EMBL; X00364.2; [-2489, 8507].
      DR EMBL; D10493.1; [-2487, 5569].
      DR EMBL; K01910.1; [-2451, 49].
      DR EMBL; M16261.1; [-1843, 1048].
      DR EMBL; J03253.1; [-1759, 461].
      DR EMBL; L00057.1; [-810, 2795].
      DR EMBL; K03015.1; [-555, 458].
      DR EMBL; X00196.1; [-532, 2792].
      DR EMBL; M12026.1; [-511, 678].
      DR EMBL; K01708.1; [-410, 500].
      DR EMBL; K00559.1; [-345, 1020].
      DR EMBL; K02280.1; [-302, 178].
      DR EMBL; K01909.1; [-266, 1365].
      DR EMBL; S65124.1; [-266, 1023].
      DR EMBL; M14206.1; [-266, 446].
      DR EMBL; M20013.1; [-240, 982].
      DR EMBL; AF111270.1; [-142, 264].
      DR EMBL; K02275.1; [-96, 780].
      DR EMBL; X00675.1; [-96, 404].
      DR EMBL; K02277.1; [-96, 157].
      DR SWISS-PROT; P01106; MYC_HUMAN.
      DR TRANSFAC; R01157; HS$CMYC_01; [-211, -189]; by position.
      DR TRANSFAC; R01158; HS$CMYC_02; [-168, -145]; by position.
      DR TRANSFAC; R01804; HS$CMYC_04; [-300, -283]; by position.
      DR TRANSFAC; R01851; HS$CMYC_05; [-65, -57]; by position.
      DR TRANSFAC; R01852; HS$CMYC_06; [-42, -34]; by position.
      DR TRANSFAC; R04076; HS$CMYC_12; [-251, -228]; by position.
      DR TRANSFAC; R04076; HS$CMYC_12; [-252, -229]; by position.
      DR TRANSFAC; R04076; HS$CMYC_12; [-253, -230]; by position.
      DR TRANSFAC; R04621; HS$CMYC_17; [-313, -262]; by position.
      DR TRANSFAC; R08503; HS$CMYC_18; [-50, -41]; by position.
      DR TRANSFAC; R16688; HS$CMYC_24; [-7, 41]; by position.
      DR TRANSFAC; R16689; HS$CMYC_25; [-7, 41]; by position.
      DR TRANSFAC; R17051; HS$CMYC_30; [-510, -480]; by position.
      DR TRANSFAC; R18503; HS$CMYC_31; [-185, -170]; by position.
      DR TRANSFAC; R18504; HS$CMYC_32; [-153, -168]; by position.
      DR RefSeq; NM_002467.
      DR MIM; 190080.
      XX
      RN   [1]
      RX   MEDLINE; 84026482.
      RA   Battey J., Moulding C., Taub R., Murphy W., Stewart T., Potter H.,
      RA   Lenoir G., Leder P.;
      RT   "The human c-myc oncogene: structural consequences of
      RT   translocation into the IgH locus in Burkitt lymphoma";
      RL   Cell 34:779-787(1983).
      RN   [2]
      RX   MEDLINE; 84131953.
      RA   Bernard O.D., Cory S., Gerondakis S., Webb E., Adams J.M.;
      RT   "Sequence of the murine and human cellular myc oncogenes and two
      RT   modes of myc transcription resulting from chromosome translocation
      RT   in B lymphoid tumours";
      RL   EMBO J. 2:2375-2383(1983).
      RN   [3]
      RX   MEDLINE; 87257828.
      RA   Lipp M., Schilling R., Wiest S., Laux G., Bornkamm G.W.;
      RT   "Target sequences for cis-acting regulation within the dual
      RT   promoter of the human c-myc gene.";
      RL   Mol. Cell. Biol. 7:1393-1400(1987).
      RN   [4]
      RX   MEDLINE; 88038843.
      RA   Broome H.E., Reed J.C., Godillot E.P., Hoover R.G.;
      RT   "Differential promoter utilization by the c-myc gene in mitogen-
      RT   and interleukin-2-stimulated human lymphocytes.";
      RL   Mol. Cell. Biol. 7:2988-2993(1987).
      XX
      ME   Nuclease protection [1,4].
      ME   Nuclease protection; transfected or transformed cells [3].
      ME   Length measurement of an RNA product; low-precision data [1].
      XX
      SE   agggagggatcgcgctgagtataaaagccggttttcggggctttatctaACTCGCTGTAG
      XX
      TX   6. Vertebrate promoters
      TX   6.1. Chromosomal genes
      TX   6.1.5. Hormones, growth factors, regulatory proteins
      TX   6.1.5.16. Various cellular protooncogenes
      XX
      KW   Proto-oncogene, Nuclear protein, DNA-binding, Glycoprotein,
      KW   Transcription regulation.
      XX
FP Hs c-myc P2+:+S EU:NC_000008.9 1+ 128817660; 11148.053 010*2
      XX
      DO        Experimental evidence: 4,4#,2l
      DO        Expression/Regulation: +mitogen
      RF        Cell34:779     EMBOJ2:2375    MCB7:1393      MCB7:2988
      //
A detailed description of each line type is given below.

4.2.1. The ID line

The identification line is always the first line of an entry. The general form of the ID line is:
ID   ENTRY_NAME data class; initiation site type; TAXONOMIC DIVISION.
  • ENTRY_NAME is a unique entry identifier "HS_MYC_2" which obeys rigorous naming conventions. It contains 2 or 3 fields, the first is the species identification code at most 4 alphanumeric characters representing the biological source of the promoter. The second field uses for gene identification the protein code of SWISS-PROT ID (if available). For human EPD entries, instead of the SwissProt ID the official gene symbol approved by the HUGO nomenclature committee (if available) is used. The third field is optional, it is either a number which represents alternative promoters or a letter for promoters of duplicated genes. The `_' sign serves as a separator.
  • The data class field relates to the quality of the information: "standard" means that the information is complete and correct according the standards laid down in this document; "preliminary" means that the entry has not yet undergone all quality checks necessary for being classified as "standard".
  • The initiation site type is either "single", "multiple", "region" as defined in Section 3.
  • TAXONOMIC DIVISION are
    • PLN for plant
    • NEM for nematode
    • ART for arthropode
    • MLS for mollusc
    • ECH for echinoderm
    • VRT for vertebrates.
    Note that these codes relate to the organism in which the promoter is expressed, not to the source organism in which the promoter is replicated as defined on the OS line.
The ID line is terminated by a period.

4.2.2. The AC line

AC   EP11148;
The accession number consists of the character string "EP" followed by 5 digits representing the EMBL release number followed by the EPD entry order. Most EPD entries currently have only one accession number. If necessary, more then one AC will be used, separated by semicolons and the list is terminated by a semicolon.

4.2.3. The DT line

The date lines show the date of entry or last modification of the entry.
DT   DD-MMM-YEAR (Rel. XX, Comment)
where `DD' is the day, `MMM' the month, `YEAR' the year, and `XX' the EPD release number. The comment portion of the line indicates the action taken on that date.
  • The first DT line indicates when the entry first appeared in the database.
  • The second DT line indicates when the promoter data was last modified. It is terminated by a period.

4.2.4. The DE line

DE   c-myc (cellular homologue of myelocytomatosis virus 29 oncogene),
DE   promoter 2.
The description lines contain general descriptive information about the promoter. The description is given in ordinary English and is free-format. It contains the swiss-prot gene names when known. In some cases, more than one DE line is required; in this case, the text is divided only between words. The last DE line is terminated by a period.

4.2.5. The OS line

OS   Mus musculus (house mouse)
The species line specifies the source organism(s) of the promotery. The species names are based on NCBI's taxonomy and thus can be automatically hyperlinked to the NCBI's taxonomy web pages.

4.2.6. The HG line

HG   Homology group 53; Mammalian c-myc proto-oncogene, promoter 2
The homology group line is optional, it contains 2 fields: a homology group number that allows identification of all sequence-wise similar promoters in EPD, and a homology group name.

4.2.7. The AP line

AP   Alternative promoter #2 of 2; 5' exon 1; site 2; major promoter.
The AP line is optional and provides information on alternative promoters of the same gene (for more details, see Section 4.3.1.). It contains 3 or 4 fields, separated by semicolons, providing the following types of information:
    descriptive text fields followed by
  • Two numbers indicating, respectively, the promoter's relative position along the gene, and the total number of alternative promoters of the gene. Promoters are numbered in the 5' to 3' directions starting with one.
  • A number referring to the exon preceded by the promoters. Note that multiple promoters may be associated with the same (3'-coterminal) exon or with different exons. Known exons are numbered in 5' to 3' direction starting with one.

  • Note that the nomenclature of 5'-exons in EPD may differ from the usage in the literature.
  • A number indicating the promoter's relative position among the subset of promoters preceeding the same exon.
  • An optional keyword indicating major promoters.
The AP line is terminated by a period.

4.2.8. The NP line

NP   Neighbouring Promoter; EP23008; MM_H2B1; [-209; -].
The NP line is optional and provides information on promoters which are physically closer to each other than 1000 bp. It contains 3 fields, separated by semicolons, providing the following types of information:
  • The EPD accession number of the neighbouring promoter.
  • The EPD identifier of the neighbouring promoter.
  • The last field indicates, respectively, the position and the direction of the neighbouring promoter relative to the transcription initiation site given in the promoter entry.
    • Negative numbers indicate the upstream region of this entry and positive ones indicate the downstream region.
    • The sign indicates the transcription direction of the neighbouring promoter relative to the promoter entry:

    • "+" means same direction
      "-" means opposite direction

4.2.9. The DR line

The DR lines contain cross-references to other EPD entries (if there are alternative promoters of the same gene), or to entries from other databases. So far, we have incorporated links to CLEANEX,EMBL (3), GenBank (4), DDBJ (5),  SWISS-PROT (6), TRANSFAC (7),  Flybase (8), MIM (9) and MGD (10). The precise format of these lines depends on the target database. Note that some cross-references include numbers enclosed in square brackets indicating the relative position of a linked sequence object, or keywords characterising the nature of the relationship between the entries. For instance, the ranges associated with cross-references to EMBL entries define the extensions of the EMBL sequences relative to the initiation site described by the EPD entry. The multiplicity of EMBL cross-references in some entries mirrors the redundancy of the sequence database. The first of these references corresponds to the longest promoter region, except when the sequences are cancelled from EMBL database, but still exist in GenBank or DDBJ.
The format of the DR line is shown by the following example lines:
     DR   GENOME; NT_037436.1; NT_037436; [-14139754, 9212459].
     DR   EPD; EP11146; HS_MYC_1; alternative promoter; [-162; +].
     DR   EMBL; J00120.1; [-2489, 8507].
     DR   SWISS-PROT; P01106; MYC_HUMAN.
     DR   SPTREMBL; Q8IQL1.
     DR   FLYBASE; FBgn0013718; nuf.
     DR   TRANSFAC; R01804; HS$CMYC_04; [-300, -283]; by position.
     DR   MIM; 190080.
     DR   RefSeq; NM_003529.
     DR   MGD; MGI:88468; Cola2.
     DR   ENSEMBL; CG32140.
     DR   TRANSCRIPTOME; DMe000571.
Explanations (for detailed information go to Guidelines ):
  • The first item on the DR line is the abbreviated name of the data collection to which reference is made. The currently defined data bank identifiers are the following:

  •  
    GENOME NCBI Reference Sequence (RefSeq) of genomic sequence contigs
    EPD Eukaryotic Promoter Database: alternative promoters of the same gene
    CLEANEX Gene expression database for human EPD promoters
    EMBL Nucleotide sequence database of the EMBL
    SWISS_PROT Protein sequence database
    SPTREMBL Subset of protein sequence database TrEMBL. It contains the entries which should be eventually incorporated into SWISS-PROT. SWISS-PROT accession numbers have been assigned for all SP-TrEMBL entries
    FLYBASE Drosophila genome database
    TRANSFAC Transcription factor (TF) database
    MIM Mendelian Inheritance in Man Database
    RefSeq Reference Sequence Database
    MGD Mouse Genome Database
    ENSEMBL Metazoan genome annotation
    TRANSCRIPTOME Catalog of transcripts and their mapping onto the genome (LICR Lausanne branch) 
    TIGR 'gene identifiers' from the 'Rice Genome Annotation' project at TIGR
  • The second item is the primary accession number (or an equivalent unique identifier of another data banks) of the entry to which reference is made.
  • The third item (if it exists) is a secondary idientifier or name for the cross-referenced database entry.
  • The fourth item for EMBL and Transfac indicates the location and extension of the sequences given in these entries relative to the transcription initiation site given in the promoter entry. Negative numbers indicate the upstream region of this site and positive ones indicate the downstream part.
  • The fifth item
    • in the EPD line, indicates the position and the direction of the alternative promoter as it is defined for the neighbouring promoter in the NP line last field
    • in the TRANSFAC line, designates the criteria used to collect the TF entry:

    • - by position: The TF binding site is situated between -500 and + 100, +1 being the transcription initiation site
      - by function: The TF binding site is known to regulate the corresponding promoter.
NB : TRANSFAC cross-reference lines should not exceed the real number of binding sites found in "TRANSFAC Site Table". Thus the position given in this DR line in related to the longest EMBL entry common to both EPD and TRANSFAC (version 6.3) databases.

4.2.10. The RN, RX, RA, RT and RL lines

These lines comprise the literature citations within EPD. The citations indicate the papers from which the data has been abstracted. The reference lines for a given citation occur in a block, and are always in the order RN, RX, RA, RT, RL. Within each such reference block the RN line occurs once, the RX lines occurs zero or more times, and the RA, RT and RL lines each occur one or more times. If several references are given, there will be a reference block for each.An example of a complete reference is:
RN   [1]
RX   MEDLINE; 84026482.
RA   Battey J., Moulding C., Taub R., Murphy W., Stewart T., Potter H.,
RA   Lenoir G., Leder P.;
RT   "The human c-myc oncogene: structural consequences of
RT   translocation into the IgH locus in Burkitt lymphoma";
RL   Cell 34:779-787(1983).
The formats of the individual lines are explained below. >

4.2.10.1. The RN line

The RN line gives a sequential number to each reference citation in an entry.This number is used to indicate the reference in the ME lines.

4.2.10.2 The RX line

The RX line is an optional line which is used to indicate the identifier assigned to a specific reference in PubMed (PMID, from the National Library of Medicine (NLM)). .

4.2.10.3 The RA line

The RA lines list the authors of the paper (or other work) cited. The authors are are listed in the order given in the paper. The names are listed surname first followed by a blank followed by initial(s) with periods. The authors' names are separated by commas and terminated by a semicolon. Author names are not split between lines. 

4.2.10.4 The RT line

The RT lines contain the title of the reference citation.

4.2.10.5 The RL line

The RL lines contain the conventional citation information for the reference. In general, the RL lines alone are sufficient to find the paper in question. It includes the journal abbreviation, the volume number, the page range, and the year. Journal names are abbreviated according to the conventions used by the National Library of Medicine (NLM) and are based on the existing ISO and ANSI standards.

4.2.11. The ME line

The method lines describe experiments defining the transcription initiation site. The format of the ME line is as follows:
ME   Method_description [; Qualifier...] [n,...].
A complete list of method descriptions is given in Section 4.3.2. Qualifiers may indicate that an experimental gene transcription system was used, that data are of low precision (less +/- 5 bp), or that the experiments were done with a closely related gene. The number(s) enclosed in square brackets links the method descriptions to the bibliographic references included in the promoter entry. The methods line from the example are:
ME   Nuclease protection [1,4].
ME   Nuclease protection; transfected or transformed cells [3].
ME   Length measurement of an RNA product; low-precision data [1].

4.2.12. The SE line

The sequence line shows a short sequence segment corresponding to the -49 to +10 region of the promoter. Transcribed and untranscribed nucleotides are represented by upper and lower case characters, respectively. This line type is not meant to provide sequence data but serves as a control string for sequence extraction.

4.2.13. The FL line

The Full length line designates the large-scale cDNA sequencing projects : NEDO (11), MGC (12), and BDGP (15).


4.2.13. The IF line

The Initiation Frequency lines reflect the frequency at which each nucleotide within the initiation region is found at the 5'end of bone fide full-length cDNA clone inserts.

4.2.14. The TX line

The TX (TaXonomy) lines define a promoter's location within EPD's hierarchical classification system (see Section 5). Note that starting from release 72, the classification system is no longer maintained.

4.2.15. The KW line

The KW lines define a number of keywords describing an entry.

4.2.16. The FP, DO and RF lines

These lines pertain to the EPD old format, see next Section. 

4.2.17. The // line

The // (terminator) line contains no data or comments. It designates the end of an entry.

4.3. Line types retained from the old format

The last six lines of a entry present essential information in the more concise, old format. A original description of the old format follows: Each entry starts with an FP line that contains a position reference to a transcription initiation site, and ends with a terminator (//).Below is an example of a promoter entry:
FP   Hs c-myc         P2+:+S  EU:NC_000008.9       1+ 128817660; 11148.053 010*2
XX
DO        Experimental evidence: 4,4#,<2>
DO        Expression/Regulation: +mitogen
RF        Cell34:779     EMBOJ2:2375    MCB7:1393      MCB7:2988
//

4.3.1. The FP line

The FP line contains the following fields and subfields:
 
  • columns
  • data type
  • 1- 2 
  • 3- 5 
  • 6-30 
    • 6-25 
    • 26-26 
    • 27-27 
    • 28-28 
    • 29-30 
  • 31-55 
    • 31-51 
    • 31-32 
    • 33-33 
    • 34-51 
    • 52-52 
    • 53-53 
    • 54-63 
  • 64-64 
  • 65-70 
  • 71-71 
  • 72-74 
  • 75-75 
  • 76-80 
    • 76-78 
    • 79-79 
    • 80-80 
  • "FP" 
  • (blank) 
  • description: 
    • promoter name 
    • ": "
    • independent subset status (see section 6
    • type of initiation site (see section 3
    • (blank) 
  • functional position reference: 
    • sequence reference: 
    • genome db code 
    • ":" 
    • genome db entry accession number 
    • sequence type (0 = circular, 1 = linear) 
    • strand (+ or -) 
    • position number 
  • ";" 
  • entry code 
  • "." 
  • homology group number (see section 6
  • (blank) 
  • alternative promoter identification code: 
    • gene number 
    • "*" 
    • Initiation site number 
Explanations:
  • The promoter name begins with a species code usually followed by a gene locus or gene product name. Species codes consist of the initials of genus and species name. Occasionally, three characters are required to generate unique codes. Standard abbreviations identify viruses. The full names of the organisms are given in appendix B.1. Subspecies or strains are specified in parentheses. Chromosomal locations (genetic or cytogenetic loci, genomic map units, etc.) may appear in square brackets immediately following species codes. Many gene products are referred to by abbreviations explained in appendix B.3. Alternative promoters are identified by right-justified "P" and a digit indicating the corresponding initiation site numbered sequentially from 5' to 3'. An optional "E" and digit refers to the corresponding 5'exons, if known. Identical numbers indicate 3'co-terminal exons. The strongest initiation site is marked by trailing + if known (see also List of alternative promoters)
  • genome db codes currently used are 'EM' for EMBL database, and 'EU' for genome contigs or chromosomal genome assemblies of the RefSeq database.
  • The EMBL accession number always relates to the first EMBL cross-reference. This one is usually the longest promoter region except when the entry is cancelled from the EMBL database, but still present in GenBank or DDBJ.
  • The sequence type indicates whether the sequence is circular or linear. A sequence comprising exactly one repeat unit of a tandem repeat cluster is also considered circular. Note that the annotation as circular or linear sequences in EPD is not always in agreement with the corresponding annotation in EMBL.
  • The entry code is a five-digit number which is the only part of a promoter entry that is stable from release to release.
  • Alternative promoter identification code: Genes represented by multiple promoter entries in EPD are assigned a promoters group number. The corresponding initiation sites are numbered sequentially from 5' to 3'.

4.3.2. DO lines: Documentation

Documentation of promoter entries is presented on lines starting with "DO". They are essentially free format and so far not processed by specific programs. In the present release, there are two DO lines per entry, the first referring to the transcript mapping experiments that define the promoter, the second giving information about expression and regulation.The varies experimental techniques are identified by number codes.The "Medline's number" and/or "example" in brackets are linked, respectively, to the abstract and/ or to the full text article describing the related experiment.

 
codes experiments
1 Direct RNA sequencing (1634116)
2 Length measurement of an RNA product (1989694)
3 Nuclease protection : Length measurement of a nuclease-protected complementary RNA or DNA fragment (2845126) (8294473)
4 RNA sequencing by primer extension : by dideoxy-terminated primer extension (3396543)
5 Sequencing of a full-length cDNA (8294473)
6 Primer extension : Length measurement of a primer extension product  (10187799 , example) (9880555 , example)
7 DNA sequencing of a full-length processed pseudogene (3584116)
8 Reverse direction primer extension with homologous sequence ladder : Length measurement of an in vitro synthesised DNA primed upstream of the initiation site and blocked by the 5'end of the RNA hybridized to the template (2451027)
9 Rapid amplification of cDNA ends (RACE) (9116864)
10 RNA sequencing, type not specifed
11 Oligo-capping : artificial capping of mRNA followed by sequencing of the 5' end of cDNA (11375929, 11337467 and examples)
12 Mammalian gene collection (MGC) full-length cDNA cloning (10521335 and example)
13 5' end confirmed by alignment of first 100 downstream nucleotides to EST database.
14 Oligo-capping: Berkeley Drosophila Genome Project (12537569)
15 Oligo-capping: Rice full-length cDNA cloning (12869764)

Special characters appended to the number codes designate an experimental gene expression system where the RNA for the corresponding experiments was synthesized.

 
* RNA POL II in vitro system
o injected amphibian oocytes
# transfected or transformed cells, injected neurons
! transgenic organisms


r experiments performed with closely related gene
h homologous sequence ladder used for length measurement of  nuclease protection or primer extension product
l low-precision data (error > +/- 5 bp)
Explanations and additional conventions:

  • The full-length assumption of a cDNA clone or a proccessed pseudogene is based on consistency with accompanying nuclease-protection or primer extension data or, alternatively, the existence of multiple 5'coterminal clones or pseudogenes.
The information on expression/regulation may include indication of developmental stages, tissues, cell types, cell cycle stages, and various regulatory features.Conventions:
  • Semicolon delimits the two fields : expression and regulation.
  • Comma delimits alternative keywords (e.g. liver, kidney)
  • "+" means "induced by" or "strongly expressed in".
  • "-" means "repressed by" or "weakly expressed in".
  • "~" means "modulated by".
  • Cell cycle stages are given in square brackets.

4.3.3. RF line: Literature references

The first four references from the RN, RX, RA, RT and RL lines are repeated in a highly condensed form. Each reference is spaced by 15 letters and indicates journal, volume, and starting page of the referred article (maximal 14 letters). The journal code explained in Appendix B.2.


They primarily point to the articles where the experimental promoter evidence is presented. Additional potential subjects are homology to other promoters, gene expression and regulation, nomenclature. Papers containing only sequence data are usually not referred to because they are easy to find via the corresponding EMBL sequence entry descriptions.

4.3.4. Miscellaneous

  • Greek letters are sometimes represented by corresponding latin letters followed by apostrophe:

  •  
    a' = alpha b' = beta g' = gamma d' = delta e' = epsilon
    z' = zeta h' = eta th'= theta k' = kappa l' = lambda
    n' = nu r' = rho
  • Sub- and superscripts are sometimes indicated by preceding "_" and "^", respectively.

4.4. Distinct format of 'preliminary' entries in epd_bulk.dat

4.4.1. The title line:

TI   epd83     Bulk Section Eukaryotic Promoter Database / Release 83 EP

4.4.2. The ID line

The identification line is always the first line of an entry. The form of the ID line in 'epd_bulk.dat' is:
ID   OS_bAAAA     preliminary; undefined; TAXONOMIC DIVISION.
  • An unique entry identifier "OS_bAAAA" is contructed using the species identification code ('OS') with at most 4 alphanumeric characters representing the biological source of the promoter and a 'b' (for bulk) followed by an arbitrary 4 letter code
  • "preliminary" data class field indicates that the entry has not (yet) undergone all quality checks necessary for being classified as "standard".
  • "undefined" as initiation site type due to insufficient data to define transcription initiation patterns (Section 3).
  • TAXONOMIC DIVISION are
    • PLN for plant
    • NEM for nematode
    • ART for arthropode
    • MLS for mollusc
    • ECH for echinoderm
    • VRT for vertebrates.
    Note that these codes relate to the organism in which the promoter is expressed, not to the source organism in which the promoter is replicated as defined on the OS line.
The ID line is terminated by a period.

4.4.3. The AC line

AC   EP00001;
The accession number consists of the character string "EP" followed by 5 digits. Previously the first two digits of the AC designated the release number of initial appearance of the specific entry followed by the EPD entry order. AC numbers in 'epd_bulk.dat' are continuous numbers, excluding ACs already used for entries in the main file 'epd.dat'.

5 CLASSIFICATION

Starting from release 72, the classification system is no longer maintained. New entries are presently added by default to an '?Unclassified' category. The classification system might still provide valuable information for entries added before release 72. However for any category, consider the possible existence of additional, potentially corresponding EPD entries in the default categories.


The entries of the Eukaryotic Promoter Database are embedded in a hierarchical classification system. A promoter's taxonomic location is made clear by interspersed group headings. The example shown below is taken from top of the database. A contrasting format has been chosen to emphasize the very different nature of this information.

*----------------------------------------------------------------------*
*    1. Plant promoters                                                *
*----------------------------------------------------------------------*
*    1.1. Chromosomal genes                                            *
*----------------------------------------------------------------------*
*    1.1.1. Small nuclear RNAs                                         *
*----------------------------------------------------------------------*
A group heading consists of a series of node numbers and a title. The highest classification level distinguishes between promoters active in major eukaryotic taxa (phyla). Further below, grouping considers replicon type and functional properties of gene products. On the lowest level, homology (as defined in section 6) is the criterion. A survey of the upper part of the classification pyramid is presented in appendix A.The proposed classification system has a highly tentative character as it is often unclear how a new promoter should be classified, especially if the gene product is a multifunctional protein. Users should therefore not be surprised or discouraged if they don't find a promoter at the initially expected place.

6 HOMOLOGOUS PROMOTERS

Homology is defined as sequence similarity due to common phylogenetic origin. In EPD, two promoters are considered homologous if they exhibit >=50% sequence similarity between -79 and +20. Similarity is calculated from optimal alignments generated with the aid of the UWGCG subroutine ShiftAlign (13) using the following symbol comparison table:

 

A C G N T
1.0 0.0 0.0 0.5 0.0 A

1.0 0.0 0.5 0.0 C


1.0 0.5 0.0 G



0.5 0.5 N




1.0 T

Gap weight and gap length weight are specified as 3 and 0, respectively. Terminal gaps are ignored. Percent similarity is understood as alignment score divided by segment length, times 100. Groups of homologous promoters are identified by homology group numbers (see 4.2.1.). Definition of these groups is based on similarity scores as defined above and a tree generation method called UPGMA (14). In a few cases, similarities between 50% and 56% were ignored if the protein sequences of the corresponding genes were not related. Similarities were also ignored between alternative promoter sequences that are spaced by less than 50 bp. A subset of "independent" promoters is marked by "+" in column 27 of the FP line. This set contains only one member per homology group (usually, the promoter with the longest upstream sequence available) and is intended to be used for statistical analysis of functional patterns where it is important to avoid bias by multiples of closely related sequences.

7 PROMOTER SEQUENCE RETRIEVAL

Promoter sequence listings have not been incorporated into EPD for two reasons: (i) to avoid duplication of data already existing elsewhere in the EMBL data library, and (ii) to encourage usage of FPS-dependent sequence retrieval programs which enables the user to specify suitable 5'- and 3'boundaries of the requested sequence segments himself. Effort is under way to motivate producers of standard nucleotide sequence analysis packages to provide such tools in the future. In the meantime, users with some programming experience will find it easy to write their own routines. Our local sequence extraction programs run in a UWGCG environment (13) and have been implemented at several sites in Europe and the United States. They are documented and freely available on request.

8 REFERENCES

  1. Bucher, P. & Trifonov, E.N., Compilation and analysis of eukaryotic POL II promoter sequences, Nucl. Acids Res. 14, 10009-10026 (1986). (3808945)

  2. Bucher, P. & Bryan, B., Signal search analysis: a new method to localize and characterize functionally important DNA sequences, Nucl. Acids Res. 12, 287-305 (1984). (6546421)

  3. Stoesser, G., Tuli,M.A., Lopez, R. and Sterk, P., The EMBL nucleotide sequence database, Nucleic Acids. Res., 27, 18-24 (1999). (9847133)

  4. Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette B.F.F,  Rapp, B:A: and Wheeler, D.L., GenBank, Nucleic Acids. Res., 27, 12-17 (1999). (9847132)

  5. Sugawara,  H., Miyazaki, S., Gojobori, T. and Tateno, Y.,DNA Data Bank of Japan dealing with large-scale data submission, Nucleic Acids. Res., 27, 25-28 (1999). (9847134)

  6. Bairoch, A. and Apweiler, R., The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999, Nucleic Acids Res., 27, 49-54 (1999). (9847139)

  7. Heinemeyer, T., Chen, X., Karas, H., Kel, A.E., Kel, O.V., Liebich, I., Meinhardt, T., Reuter, I., Schacherer, F. and Wingender, E., Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms, Nucleic Acids. Res., 27, 318-322 (1999). (9847216)

  8. The FlyBase consortium, The FlyBase database of the drosophilia genome projects and community litterature, Nucleic Acids. Res., 27,85-88 (1999). (9847148)

  9. Pearson, P., Francomano, C., Foster, P., Bocchini, C., Li, P. and McKusick, V., The status of online Mendelian inheritance in man (OMIM) medio 1994, Nucleic Acids Res., 22, 3470-3473 (1994). (7937048)

  10. Blake, J.A., Richardson, J.E., Davisson, M.T., Eppig, J.T. and the Mouse Genome Database Group, The Mouse Genome Database (MGD): genetic and genomic information about the laboratory mouse, Nucleic Acids Res., 27, 95-98 (1999). (9847150)

  11. Suzuki Y., Yamashita R., Nakai K., Sugano S., DBTSS: database of human transcriptional start sites and full-length cDNAs. Nucleic Acids Res. 30(1):328-331(2002). (11752328)

  12. Strausberg, R.L., Feingold, E.A., Klausner, R.D., Collins, F.S., The Mammalian Gene Collection. Science, 286, 455-457 (1999). (10521335)

  13. Devereux,J., Haeberli,P., & Smithies,O. A comprehensive set of sequence analysis programs for the VAX, Nucl. Acids Res. 12, 387-395 (1984). (6546423)

  14. Sneath,H.A. & Sokal,R.R., Numerical taxonomy, W.H. Freemann, San Francisco, London (1973).

  15. Stapleton M., Liao GC., Brokstein P., Hong L., Carninci P., Shiraki T., Hayashizaki Y., Champe M., Pacleb J., Wan K., Yu C., Carlson J., George R., Celniker S., and Rubin GM., The Drosophila Gene Collection: Identification of Putative Full-Length cDNAs for 70% of D. melanogaster Genes. Genome Res., 12:1294-1300 (2002). (12176937)

  16. Schmid C.D., Praz V., Delorenzi M., Périer R., and Bucher P., The Eukaryotic Promoter Database EPD: the impact of in silico primer extension. Nucleic Acids Res. 32, D82-5 (2004). (14681364)

  17.  

A.  APPENDIX A : SURVEY OF RELEASE


B.  APPENDIX B : CODES AND ABBREVIATIONS

B.1. SPECIES CODES


 

Code Scientific name (English name)
AAV2 Adeno-associated virus 2
Ac Aplysia californica (California sea hare)
AcNPV Autographa californica nuclear polyhedrosis virus
Ad2 Human adenovirus type 2
Ad5 Human adenovirus type 5
Ad7 Human adenovirus type 7
Ad12 Human adenovirus type 12
Ag Ateles geoffroyi (black-handed spider monkey)
ALV Avian leukosis virus
Am Antirrhinum majus (snapdragon)
Ab-MLV Abelson murine leukemia virus
Apo Antheraea polyphemus (polyphemus moth)
Ap Anas platyrhynchos (mallard, domestic duck)
As Avena sativa (oat)
At Agrobacterium tumefaciens
Ath Arabidopsis thaliana (thale cress)
Atr Aotus trivirgatus (douroucouli)
Ay Antheraea yamamai
B19 Human parvovirus B19
Be Bertholletia excelsa (Brazil nut)
BKV Papovavirus BKV
BLV Bovine leukemia virus
Bm Bombyx mori (silkworm)
Bn Brassica napus (rape)
BPV1 Bovine papillomavirus type 1
Bt Bos taurus (cattle)
CaMV Cauliflower mosaic virus
Cco Coturnix coturnix (quail)
Ce Caenorhabditis elegans
Cg Canavalia gladiata (sword bean)
Cgr Cricetulus griseus (Chinese hamster)
Ch Capra hircus (goat)
Cl Canis lupus (gray wolf)
Cm Cairina moschata (muscovy duck)
Cp Cavia porcellus (domestic guinea pig)
Cpe Cucurbita pepo (zucchini)
Ct Chironomus thummi (midge)
Cte Chironomus tentans
Dc Daucus carota (carrot)
Df Drosophila funebris (fruit fly)
Dh Drosophila hydei (fruit fly)
DHBV Duck hepatitis B virus
Dm Drosophila melanogaster (fruit fly)
Dma Drosophila mauritiana (fruit fly)
Dmo Drosophila mojavensis (fruit fly)
Dmu Drosophila mulleri (fruit fly)
Do Drosophila orena (fruit fly)
Dp Drosophila pseudoobscura (fruit fly)
Ds Drosophila simulans (fruit fly)
Dse Drosophila sechellia (fruit fly)
Dv Drosophila virilis (fruit fly)
EBV Human herpesvirus 4 (Epstein-Barr virus)
Ec Equus caballus (horse)
FBJ-MSV Murine osteosarcoma virus (Finkel-Biskis-Jinkins)
FBR-MSV Murine osteosarcoma virus (Finkel-Biskis-Reilly)
F-MCF Friend mink cell focus-forming virus (Murine)
Fs Felis silvestris (wild cat)
F-SFFV Friend spleen focus-forming virus
Ft Flaveria trinervia
GA-FeLV Gardner-Arnstein feline leukemia oncovirus B
GALV Gibbon ape leukemia virus
Gg Gallus gallus (chicken)
Ggo Gorilla gorilla (gorilla)
Gm Glycine max (soybean)
GSHV Ground squirrel hepatitis virus
H-1 Parvovirus H1 (Murine)
Ha Helianthus annuus (common sunflower)
Hb Hevea brasiliensis (para rubber tree)
HBV Human hepatitis B virus
HCMV Human cytomegalovirus
Hg Halichoerus grypus (grey seal)
HIV-1 Human immunodeficiency virus type 1
HIV-2 Human immunodeficiency virus type 2
HPV16 Human papillomavirus type 16
HPV18 Human papillomavirus type 18
Hs Homo sapiens (human)
HSV-1 Human herpesvirus 1
HSV-2 Human herpesvirus 2
HTLV-I Human T-cell leukemia virus type I
HTLV-II Human T-cell leukemia virus type II
Hv Hordeum vulgare (barley)
HVS Herpesvirus saimiri
JCV Human polyomavirus JCV
Le Lycopersicon esculentum (tomato)
Leu Lepus europaeus (European hare)
Lm Locusta migratoria (migratory locust)
Lp Lytechinus pictus (painted urchin)
Lpe Lycopersicon peruvianum (Peruvian tomato)
Lv Lytechinus variegatus (green urchin)
Ma Mesocricetus auratus (golden hamster)
Mc Macaca fascicularis (crab-eating macaque)
MCMV Murine cytomegalovirus
MLV_AKV AKV murine leukemia virus
MLVxeno Xenotropic murine leukemia virus
Mm Mus musculus (house mouse)
M-MLV Moloney murine leukemia virus
M-MSV Moloney murine sarcoma virus
MMTV Mouse mammary tumor virus
Ms Medicago sativa (alfalfa)
MSV Maize streak virus
Np Nicotiana plumbaginifolia (curled-leaved tobacco)
Ns Nicotiana sylvestris (wood tobacco)
Nt Nicotiana tabacum (common tobacco)
Nto Nicotiana tomentosiformis
Oa Ovis aries (sheep)
Oc Oryctolagus cuniculus (rabbit)
Os Oryza sativa (rice)
Ph Petunia hybrida (e.g. Petunia strain Mitchell)
Pa Papio anubis (olive baboon)
Pc Petroselinum crispum (parsley)
Pl Paracentrotus lividus (common urchin)
Pm Psammechinus miliaris (sand urchin)
Polyoma Mouse polyomavirus
Ppy Photinus pyralis (North American firefly)
Pp Pongo pygmaeus (orangutan)
Ps Pisum sativum (pea)
Pt Pan troglodytes (chimpanzee)
Pth Pinus thunbergii (Japanese black pine)
Pv Phaseolus vulgaris (kidney bean)
RAV2 Rous associated virus type 2 (Avian)
Rc Ricinus communis (castor bean)
R-MCF Rauscher mink cell focus-forming virus
Rn Rattus norvegicus (Norway rat)
RSV Rous sarcoma virus (Avian) 
Sa Sinapis alba (white mustard)
SA7P Simian adenovirus (7P)
Sd Strongylocentrotus droebachiensis
Se Nannospalax ehrenbergi (Ehrenberg's mole-rat)
Sg Oncorhynchus mykiss (rainbow trout)
SIV Simian immunodeficiency virus
SNV Spleen necrosis virus
So Spinacia oleracea
Sp Strongylocentrotus purpuratus
Spe Sarcophaga peregrina
Sr Sesbania rostrata
SRV-1 Simian AIDS retrovirus SRV-1
Ss Sus scrofa (pig)
SSV Simian sarcoma virus
St Solanum tuberosum (potato)
Sv Sorghum bicolor (sorghum)
SV40 Simian virus 40
Ta Triticum aestivum (wheat)
Visna Visna lentivirus
Xb Xenopus borealis (Kenyan clawed frog)
Xl Xenopus laevis (African clawed frog)
Xt Xenopus tropicalis (western clawed frog)
Zm Zea mays (maize)

B.2. JOURNAL CODES


 

Code Journal name
ARB Annual Review of Biochemistry
ARP Annual Review of Physiology
BBA Biochimica Biophysica Acta
BBRC Biochemical and Biophysical Research Communications
Bch Biochemistry
Bchi Biochimie
BchJ Biochemical Journal
BCHS Biological Chemistry Hoppe-Seyler
BrJR British Journal of Rheumatology
BrainR Brain Research
Btech Biotechnology
CanR Cancer Research
Cell Cell
CGD Cell Growth Differentiation
Chrom Chromosoma
CSHS Cold Spring Harbor Symposia on Quantitative Biology
CTMI Current Topics in Microbiology and Immunology
CurG Current Genetics
DCB DNA and Cell Biology
DevB Developmental Biology
Diab Diabetes
DNA DNA
ECR Experimental Cell Research
EJBc European Journal of Biochemistry
EJCB European Journal of Cellular Biology
EMBOJ EMBO Journal
EMBOR EMBO Reports
Evo Evolution
FEBS FEBS Letters
GDev Genes and Development
Gene Gene
GChC Genes Chromosomes Cancer
GnmR Genome Research
Gnms Genomics
Gnts Genetics
HGEN Human Genetics
IJCa International Journal of Cancer
ImTo Immunology Today
JBC Journal of Biological Chemistry
JBch Journal of Biochemistry
JCB Journal of Cell Biology
JEM Journal of Experimental Medicine
JGV Journal of General Virology
JI Journal of Immunology
JMAG Journal of Molecular and Applied Genetics
JMB Journal of Molecular Biology
JME Journal of Molecular Evolution
JMEnd Journal of Molecular Endocrinology
JNeSc Journal of Neuroscience
JVir Journal of Virology
MB Molecular Biology
MBE Molecular Biology and Evolution
MBM Molecular Biology and Medicine
MBR Molecular Biology Reports
MCB Molecular and Cellular Biology
MCEnd Molecular and Cellular Endocrinology
MEnd Molecular Endocrinology
MImm Molecular Immunology
MEnz Methods in Enzymology
MGG Molecular and General Genetics
MNeub Molecular Neurobiology
MPMI Molecular Plant-Microbe Interactions
NAR Nucleic Acids Research
Nat Nature
Oncg Oncogene
OncR Oncogene Research
Pla Planta
PlJ Plant Journal
PMB Plant Molecular Biology
PSL Plant Science Letters
RPHR Recent Progress in Hormone Research
PNAS Proceedings of the National Academy of Sciences of the United States of America
Sci Science
SCMG Somatic Cell and Molecular Genetics
TiG Trends in Genetics
Vir Virology
VirR Virus Research

B.3.  ABBREVIATIONS


 

1-25OH2D3 1,25-(OH)_2 vitamin D_3
20-OHE 20-Hydroxyecdysone
4CL 4-coumarate coenzyme A ligase
a1 Gene locus 1 involved in anthocyanin biosynthesis
abd-g. Abdominal ganglion
abl Abelson murine leukemia virus oncogene
ACC 1-aminocyclopropane-1-carboxylic acid
AChR Acetylcholin receptor
ACP b'-ketoacyl-acyl carrier protein of fatty acid synthase
ACTH Adrenocorticotropic hormone
ADA Adenosine deaminase
ADH Alcohol dehydrogenase
ADPg-s GT ADPglucose-starch glucosyltransferase
adult-HA Adult hermaphrodite
AFW1 Adult fast-white (myosin heavy chain) 1
Ag Antigen
(AGM) "from african green monkey"
AGP Acid glycoprotein
AGPP ADP glucose pyrophosphorylase
AIRS Aminoimidazole ribonucleotide synthase
ALA-synt. 5-Aminolevulinate synthase
ALDH_2 Aldehyde dehydrogenase 2
AlkExo Alkaline exonuclease
Amy Amylase
antp "antennapedia" locus
aP2 Adipocyte homologue of myelin P2
apolipop. Apolipoprotein
apoVLDLII Very low densitiy apolipoprotein II
APRT Adenine phosphoribosyltransferase
AR Adrenergic receptor
ARF ADP-ribosylation factor
arg Arginine
AS Argininosuccinate synthetase
AS-C "achaete-scute" complex locus
AspAT Aspartate aminotransferase
ass. Associated
AT Antitrypsin
ATIII Antithrombin III
ATCase Aspartate transcarbamylase
ATP Adenosinetriphosphate
awd "abnormal wing disk" locus
BB Bowman-Birk (protease inhibitor)
BCKDHA Branched-chain alpha-keto acid dehydrogenase complex
Bcl-2 B-cell leukemia/lymphoma 2 proto-oncogene
BMMC Bone marrow-derived mast cell
BPTI Bovine pancreatic trypsin inhibitor
BSF B-cell stimulating factor
bsg25D Blastoderm specific locus 25D
c- Cellular protooncogene ..
c1 Regulatory locus of anthocyanin synthesis (maize)
C4BP Complement component C4-binding protein
CA Carbonic anhydrase
CAD Carbamoyl-phosphate synthetase (glutamine-hydrolysing)/aspartate carbamoyl transferase/dihydroorotase
cab Chlorophyll a/b-binding protein
cAMP Cyclic AMP (Adenosinemonophosphate)
card-m. Cardiac muscle
cc-ind. Cell cycle-independent
CD3 T-cell differentiation antigen CD3
CD4 T-cell differentiation antigen CD4
CD8 T-cell differentiation antigen CD8
CEA Carcinoembryonic antigen
CG Chorionic gonadotropin
CNS Central nervous system
CNTF Ciliary neurotrophic factor
car. Cartilage
col. Collagen
conglyc. Conglycinin
cor. Cornea
cotyl. Cotyledon
cp Cytoplasm(ic)
CPS Carbamyl-phosphate synthetase
CRF Corticotropin-releasing factor
CRP C-reactive protein
cs Cytosol(ic)
CSF Colony stimulating facter
cyt Cytokinin gene (coding for isopentenyltransferase)
DAF Decay-accelerating factor
dbp DNA binding protein
DDC DOPA decarboxylase
DDH Dihydrodiol dehydrogenase
dep. dependent
dev. Development(ally)
DHFR Dihydrofolate reductase
diff. differentiation, differentiated
DL/R Left and right duplicated region
dnc "dunce" locus
dUTPase Deoxyuridinetriphosphatase
E 1. Early, 2. Erythroid cell-specific
E8 Ethylene inducible gene during fruit ripening 8
EAS 5-epi-aristolochene synthase (sesquiterpene cyclase)
EBNA Epstein-Barr virus nuclear antigens
ecd-ind. Ecdysone-inducible
EDF Eosinophil differentiation factor
EFW1 Embryonic fast-white (myosin heavy chain) 1
EGF Epidermal growth factor
EIa Adenovirus early Ia region (transactivating element)
Eip Ecdysone-induced protein
ELH Egg-laying hormone
em Embryo, embryonic
epithel epithelial or epithelium
EPSP 5-Enolpyruvylshikimate-3-phosphate
erbA,B (Avian) erythroblastosis virus oncogene A,B
E-resp. Estrogen-responsive
ERV3 Endogenous retrovirus 3
E.Tn Early transposon
et-hypocot. Etiolated hypocotyl
ev1 (Avian) endogenous virus 1
eve "even-skipped" locus
exch. Exchanger
f. Factor
fib. Fibers
fibrob. Fibroblasts
FMRFamide Phe-Met-Arg-Phe-NH(2) neuropeptide
FNR Ferredoxin-(NADP+)-oxidoreductase
FBP Folate Binding Protein
fos FBJ (Finkel-Biskis-Jinkins) osteosarcoma virus oncogene
FSH Follicle stimulating hormone
ftz "fushi tarazu" locus
g. Gene
G0S.. G0/G1 switch regulatory gene ..
G6PD Glucose-6-phosphate dehydrogenase
GA Gibberellic acid
GADPH Glyceraldehyde-3-phosphate dehydrogenase
GARS Glycinamide ribonucleotide synthase
Gart "Gart" locus (-> GARS, AIRS, GART)
GART Glycinamide ribonucleotide transformylase
gC Glycoprotein C
G-CSF Granulocyte colony stimulating factor
gD Glycoprotein D
GdX X-linked gene downstream of G6PD gene
gE Glycoprotein E
GFAP Glial fibrillary acidic protein
g'GT g'-Glutamyl transpeptidase
gln Glutamine
globul-12s 12s globulin (oat seed storage protein)
glucc Glucocorticoid
GLUT1 Glucose transporter type 1
GM-CSF Granulocyte/Macrophage colony stimulating factor
GnRH Gonadotropin-releasing hormone
gp Glycoprotein
GPD Glycerol-3-phosphate dehydrogenase
GPT UDP-GlcNAc:dolichol phosphate N-acetylglucosamine-1-phosphate transferase
granulo-c Granulocyte
GRF Growth hormone-releasing factor
GRP Glycine-rich (cell wall) protein
GS17 Gastrula-specific transcript 17
GSHPx Gluthathione peroxidase
G-spec. Gastrula-specific
GST Gutathione S-transferase
H 1. Heavy chain, 2. Housekeeping-type promoter
Ha-ras Rat-derived Harvey murine sarcoma virus oncogene
haptoblob haptoglobin
hb "hunchbank" locus
Hc High-cysteine (chorion protein)
HDC L-histidine decarboxylase
hematop. hematopoietic
HGT High-(glycine+tyrosine) keratin
hist. Histone
HMG- High mobility group chromosomal protein
HMG-CoA 3-Hydroxy-3-methylglutaryl coenzyme A
HPRT Hypoxanthine phosphoribosyltransferase
hs Heatshock
hsc Constitutive analogue of heatshock gene/protein
HSF Hepatocyte-stimulating factor
hsp Heatshock protein
Ht Testicular histone
HTF Restriction endonuclease HpaII tiny fragments
I-FABP Intestinal fatty-acid binding protein
IAA Indolacetic acid
IAP Intracisternal A-particles
ICP Infected cell protein
IE Immediate early (gene, RNA)
IF Intermediate filament
IFI Interferon-induced gene/protein
IFN Interferon
Ig Immunoglobulin
IGF Insulin-like growth factor
IL Interleukin
inf. Infected
inh. Inhibitor
iNOS Inducible nitric oxide synthase
IRF Interferon regulatory factor
ISG Interferon-stimulated gene
k. Kinase
keratino-c Keratinocyte
Ki-ras Rat-derived Kirsten murine sarcoma virus oncogene
L 1. Light chain; 2. Late
larva-1,2,.. First, second, .. instar larva
LAT.. Lycopersicon anther-specific gene ..
LCAT Lecithin-cholesterol acyltransferase
lck T-cell- or lymphocyte-specific tyrosine kinase
LDH Lactate dehydrogenase
leghem. Leghemoglobin
LeIF Leukocyte interferon
leuko-c Leukocyte
LH Luteinizing hormone
LHC Light-harvesting complex
LHRH Luteinizing hormone-releasing factor
liv. liver
LMW Low molecular weight
LPH Lipotropic hormone
LPS Lipopolysaccharide
LTR Long Terminal Repeat
lympho-c Lymphocyte
lys Lysosomal
MBP Myelin basic protein
(MAC) Macaque
MC Methylcholanthrene
MCK Muscle-specific creatine kinase
mGK Submaxillary gland kallikrein
MHCI/MHCII Class I/II transplantation antigens of major histocompatibility complex
MIF Macrophage migration inhibitory factor
minipara Miniparamyosin
mit Mitochondrial
mono-c Monocyte
mononuc-c. Mononuclear cells
MOPC.. Mineral oil-induced plasmacytoma
mos Moloney murine sarcoma virus oncogene
MP Macrophage
MPC.. Mouse plasma cell tumor
MRP MIF-related protein (see MIF)
MSF Megakaryocyte stimulating factor
msp Major sperm protein gene
MT Metallothionein
mst Male-specific transcript
MUP Major urinary protein
myb (Avian) myeoloblastosis virus oncogene
myc Myelocytomatosis virus 29 oncogene
NCA nonspecific cross-reacting (with -> CEA) antigen
nerv. sys Nervous system
neu Ethyl-nitrosurea-induced rat neuroblastoma oncogene
neuropep. Neuropeptide
NGF Nerve growth factor
ninaE "neither inactivation nor afterpotential" locus E
NMDH NADP-malate dehydrogenase
NOS Nitric oxide synthase
nos Nopaline synthetase
NR Nitrate reductase
N-ras Neuroblastoma ras-like (-> Ha-ras) oncogene
NS Nervous system
OAT Ornithine aminotransferase
ocs Octopine synthetase
ODC Ornithine decarboxylase
Ori Origin of replication
OTC Ornithine transcarbamylase
ovalb. Ovalbumin
p. Protein
P-450 Cytochrome P-450
p53 53K phosphoprotein
panc. pancreas, pancreatic
parath. Parathyroid
PB Phenobarbital
PBGD Porphobilinogen deaminase
PCNA Proliferating cell nuclear antigen
PDEase cAMP phosphodiesterase
PDGF Platelet-derived growth factor
PEPCase Phosphoenolpyruvate carboxylase
PEPCK Phosphoenolpyruvate carboxykinase
PG Prostaglandin
PGK 3-Phosphoglycerate kinase
PHA Phytohemagglutinin
PK Protein kinase
P_L Late promoter
PLP Proteolipid protein
POL Polymerase
POMC Proopiomelanocortin
pp.. Phosphoprotein ..
PR1a Pathogenesis-related protein 1a
PRBP Plasma retinol-binding protein
PRL Prolactin
prog. Progesterone
prolyl 4-hydr. Prolyl 4-hydroxylase
PrP Prion protein
PSG1,PSG2,. Pregnancy-specific glycoproteins 1,2,.
PSBP Prostatic steroid binding protein
PSP Parotid secretory protein
PTH Parathyroid hormone
pTiN Nopaline type tumor inducing plasmid
pTiO Octopine type tumor inducing plasmid
r "rudimentary" locus
R 1. Regulatory subunit, 2. Erythroid cell-specific
RAB Gene responsive to ABA
ras Homologue of -> Ha-ras, Ki-ras, etc.
rec. Receptor
red. Reductase
reg. Regulated
rep-dep. Replication-dependent
rig Rat insulinoma gene
RnBP Renin-binding protein
RNR1, RNR2 Ribonucleotide reductase large, small subunit
rp Ribosomal protein
rTn Retrotransposon
RuBPCss Ribulose-1,5-biphosphate carboxylase small subunit
RuBPCA Ribulose-1,5-biphosphate carboxylase/oxygenase activase
s. Small
saliv-g. Salivary gland
SBP Spermine-binding protein
SC Stem cells
sem-v. Seminal vesicle
ser. Serum
sgs Salivary gland secretion protein
sis Simian sarcoma virus oncogene
sk-m. Skeletal muscle
skel-m. Skeletal muscle
smooth-m. Smooth muscle
snRNA Small nuclear RNA
snRNA Small nuclear ribonucleoprotein
SOD Superoxide dismutase
som Somatic
spat-reg. Spatially regulated
Spec Strongylocentrotus purpureatus ectoderm enriched RNA
SPI Serine protease inhibitor
sry "serendipity" locus
SV40T Tumor antigen of simian virus 40 (SV40)
SVS Seminal vesicle secretory protein
synt. Synthase
T3d' T-cell antigen receptor-associated T3-complex delta chain
TAT Tyrosine aminotransferase
TCDD 2,3,7,8-Tetrachlorodibenzo-p-dioxin
TCGF T-cell growth factor
TCR T-cell receptor
TdT Terminal deoxynucleotidyltransferase
test. testis
TF Transcription factor
TGA1a TGACG-specific DNA-binding protein 1a
TGF-b' Transforming growth factor beta
TH Tyrosin hydroxylase
thyr. Thyroxine
Thy-1.2 Thy-1 (thymocyte) antigen/glycoprotein allotype 2
TIF Trans-inducing factor
TIM Triosephosphate isomerase
tis. Tissue
TM Tropomyosin
tmr "tumor morphology root" locus
TNF Tumor necrosis factor
TnI Troponin I (inhibitory subunit)
TnT Troponin T (tropomyosin-binding subunit)
TO Tryptophan oxygenase
TP1,TP2,. Transition protein 1,2,.
TPA 12-O-tetradecaonyl-phorbol-13-acetate
TPI Triosephosphate isomerase
tr.,tr- Transcript
TRF T-cell replacing factor
TRH Thyrotropin-releasing hormone
TS Thymidylate sythetase
TSH Thyroid stimulating hormone
T/t Large/small T(tumor) antigen
Ubx "ultrabithorax" locus
uPA Urine plasminogen activator
URO-D Uroporphyrinogen decarboxylase
Vg1 Vegetal hemisphere-specific mRNA 1
vir-inf. Viral infection
VL30 Retrovirus-like 30s RNA
VLDL Very low density lipoprotein
V_NP (Immunoglobulin heavy chain) variable region specific for 4-hydroxyl-3-nitrophenacetyl
VP5 Virion protein 5 (HSV-1/2: =major capsid protein)
VSP Virion stimulatory protein
vWf von Willebrand factor
Zen "zerknuellt" protein
Last update October 2019