DR EPD; "AC"; "ID"; "definition"; "position; direction"
"AC" | string | EPD accession number. |
"ID" | string | EPD identification. |
"definition" | keyword | alternative promoter |
"position; direction" | seq_pos | Position and direction relative to the transcription initiation site. |
3. Conditions and rules:
4. Target identification:
5. Storage, maintenance and verification:
DR EMBL; "SV"; "ID"; "position".
"SV" | string | EMBL sequence version. |
"ID" | string | EMBL identification. |
"position" | seq_pos | Position of the EPD transcription initiation site. |
Note: seq_pos is a positional sequence object represented according
to the Milano convention.
2. Biological meaning:
3. Conditions and rules:
4. Target identification:
5. Storage, maintenance and verification:
DR SWISS-PROT; "AC"; "ID".
"AC" | string | SWISS-PROT accession number. |
"ID" | string | SWISS-PROT identification. |
DR TRANSFAC; "AC"; "ID"; "position"; "type".
"AC" | string | TransFac accession number. |
"ID" | string | Transfac identification. |
"position" | seq_pos | Position of site relative to the EPD initiation site. |
"type" | keyword | Type of initiation sites. |
Notes:
seq_pos is a positional sequence object represented according to
the Milano convention.
Valid keywords for "type" are: "by position", "by function". 2. Biological meaning:
3. Conditions and rules:
Rule 2: For a given pair of a TRANSFAC and an EPD entry, there can be at most one cross-reference for each SQ line in the TRANSFAC entry. A cross-reference can only be created for an SQ line, if the sequence matches a range of an EMBL sequence indicated
4. Target identification:
TRANSFAC entries containing multiple SQ lines with identical sequences may cause promblems to automatic procedures for establishing "by position" cross-references. There is a program for identification of such entries which can be used in the following way:
g77 -fno-automatic /home/epd/src/find_difficult_sites.f ./a.out < /db/transfac/site.datIn release 35, the following entries were found:
R00052(not in 61) R00281 R00470 R00603 R00825(not in 61) R00826 " R02711 R02728 " R03466 " R03715 R03735 "These entries may have to be processed manually.
Verification of positions:
Examples:
1. EPD on + strand, TRANSFAC site on + strand:
EPD: | ID AD2_DNBI standard; single; VRT. | AC EP07159; | DE Major late promoter | OS Human adenovirus type 2 e1,e2 | DR EMBL; J01917; HACG; g209811; [-6038, 29899]. p1,p2 | DR TRANSFAC; R03156; AD$MLP_39; [-31,-24]; automatic. SE gtgttcctgaaggggggctataaaagggggtgggggcgcgttcgtcctcACTCTCTTCCG TRANSFAC: | AC R03156 | ID AD$MLP_39 | DE MLP (major late promoter); G000010. | SQ TATAAAA. t1,t2 | DR EMBL: J01917; HACG(6008:6014).Conversion arithmetics: p1 = e1 + t1 - 1 p2 = e1 + t2 t1 = p1 - e1 + 1 t2 = p2 - e1 |
5. Storage, maintenance and verification:
DR FLYBASE; "ID"; "symbol".
"ID" | string | FlyBase identification |
"symbol" | string | FlyBase symbol. |
DR FLYBASE; "ID"; "symbol".
"ID" | string | FlyBase identification |
"symbol" | string | FlyBase symbol. |
DR MGD; "MGI:ACID"; "symbol".
"ACID" | string | MGD accession identification |
"symbol" | string | MGD symbol. |
2. Biological meaning:
3. Conditions and rules:
4. Target identification:
5. Storage, maintenance and verification:
-15 -10 -5 0 5 ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' a C C C G C C t g c a c c c g a t t c a t g t g a g a a <-----------> | TRANSFAC entry R99999/ET$SITE_1 +---> EPD entry EP99999 <-------------------------------------------------------> EMBL entry ZZ999999.1/HS28BP |
1 5 10 15 20 25 ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' a C C C G C C t g c a c c c g a t t c A T G T G A G A A <---------> | TRANSFAC entry R99999/ET$SITE_1 EPD entry EP99999 <-----------------------------------------------------> EMBL entry ZZ999999.1/HS28BP |
["begin","end"]where "begin" and "end" are integers reflecting the start and end position of the biological sequence object, respectively.