Palindromes, homopolymers and simple repeats .
Description
This series contains features that are directly computed from
the genome sequence with ad hoc scripts, including short palindromes,
homopolymers and simple repeats. Some of these features appear to
be enriched or depleted in certains regions. For instance,
palindromes are enriched in the regulatory regions of certain species.
Simple repeats tend to be depleted in conserved non-coding regions.
Source
Samples
From H. sapiens (Feb 2009 GRCh37/hg19).
Notes on samples:
- CpG dinucleotides: any occurrence of the sequence CG
- W-hexamers: any match to the consensus sequence WWWWWW (W means A or T).
- S-hexamers: any match to the consensus sequence SSSSSS (S means C or G).
- R(+)/Y(-)-hexamers: any match to the consensus sequences RRRRRR if the
strand field is "+" or YYYYYY if the strand field is "-" (R means A or G
and Y means C or T). Note that R and Y are reverse complementary
dinucleotide types. Wherever there is an R on the + strand of the chromosome,
there is Y on the - strand.
- M(+)/K(-)-hexamers: any match to the consensus sequences MMMMMM if the
strand field is "+" or KKKKKK if the strand field is "-" (M means A or C
and K means G or T). Note that M and K are reverse complementary
dinucleotide types. Wherever there is an M on the + strand of the chromosome,
there is K on the - strand.
- hexa-homopolymers (aaaaaa): a run of six identical bases in a row.
- 3x2-repeats (ababab): Dinucleotide repeats of the form ababab.
a must not be identical to b.
- 2x3-repeats (abcabc): Trinucleotide repeats of the form abcabc.
The trinucleotide abc must contain at least two different bases.
- hexa-palindromes (abcxyz): Palindromes of the form abcxyz.
a must be complementary to z, b to y and c to x.
- hepta-palindromes (abcNxyz): Palindromes of the form abcNxyz.
a must be complementary to z, b to y and c to x. N means any base.
The genomic positions recorded in the SGA files correspond to the internal
position 3 of all hexameric features, and to position 4 of hepta-palindromes.
Technical Notes
The SGA files were generated with the following Perl scripts:
- cg2sga.pl
- wwwwww2sga.pl
- ssssss2sga.pl
- rrrrrr2sga.pl
- mmmmmm2sga.pl
- aaaaaa2sga.pl
- ababab2sga.pl
- abcabc2sga.pl
- abcxyz2sga.pl
- abcNxyz2sga.pl
available from the MGA script archive at:
https://epd.expasy.org/ftp/mga/scripts/
Last update: 1 Oct 2018
Page made by: