This tool allows the selection of all or a subset of promoters
from an EPDnew database. Selection can be restricted based on
Promoter or Gene IDs, genomic context or other
characteristics. Multiple criteria can be used at the same
time, for example providing a set of Gene IDs and restricting the
selection to promoters that contain a TATA box. Here is a
description of the criteria used by each selection method.
Selection by ID: write one ID per line without
the use of any symbols (',', ';', '|', etc.) to separate IDs.
In the output page, promoters are always annotated using EPDnew
IDs. To facilitate the conversion between user-provided IDs
and EPDnew IDs, the output page provides a log file with the
conversion table. Note that multiple promoters can be
associated with one ID. Users can restrict the selection to only
one promoter per gene by activating the check box, which will
select the most representative promoter (see 'Additional
Options'). IDs can be of the following types (some of them are
species-specific):
-
EPDnew ID: promoter ID used here (MAPK1_1, TP53_1,
TBP_1, etc). It is available for all databases.
-
ENSEMBL GENE ID: gene ID from the Ensembl
database (ENSG00000002016, ENSG00000003509,
ENSG00000003989).
-
RefSeq ID: transcript ID from the RefSeq database
(NM_032974, NM_002355, NM_001013836).
-
FlyBase ID: ID from FlyBase annotation
(FBgn0025740, FBgn0039897, FBgn0039904)
-
WormBase ID: ID from WormBase annotation
(WBGene00022279, WBGene00022037, WBGene00022368)
-
AGI ID: Arabidopsis Gene ID (AT1G01010)
-
Gramene GENE ID: Gramene Gene ID (GRMZM2G330436,
GRMZM2G440537, GRMZM2G008710)
-
sgdGene ID: Saccharomyces Genome Database Gene ID
(YAL061W, YAL024C, YAL001C)
-
PomBase ID: S. pombe Genome Database Gene ID
(SPCP20C8.02c, SPCC330.04c, SPCC1235.07)
Selection by precomputed characteristics:
-
TATA box: a promoter is with a TATA box
if the motif is found at position −28 (± 3
bp) from the TSS (evaluated using FindM).
-
Initiator: a promoter is with an Initiator
motif if it is found at position 0 from the
TSS (evaluated using FindM).
-
CCAAT box: a promoter is with a CCAAT
motif if it is found in the region −200 to
−50 from the TSS (evaluated using FindM).
-
GC box: a promoter is with a GC
motif if it is found in the region −200 to
−50 from the TSS (evaluated
using FindM).
-
Average Expression: for each sample used in
generating an EPDnew collection, promoter expression is
calculated as the number of tags matching the region
from −250 to +250 bp relative to the TSS. Each sample
is normalized to a total tag count of 10 M.
-
Expression call: a promoter is expressed in sample
X if the number of tags that map at the TSS is higher than
3.
Additional Options: Users can restrict the
selection only to the most representative promoter for a
gene. In this case only one promoter will be associated with a
gene, the one that has been validated by the
largest number of samples or, if inconclusive, the one located
most upstream. Note that for some organisms, the samples might not
be representative of the normal growth conditions and be
restricted to specific tissues, growth conditions or
developmental stages. This may have an impact on the selection
of the most representative promoter (not general but specific to the
conditions used during the experiment).
Note: Depending on the organism selected, some motifs may
not be available for filtering (e.g. P. falciparum).
|