Palindromes, homopolymers and simple repeats .
This series contains features that are directly computed from
the genome sequence with ad hoc scripts, including short palindromes,
homopolymers and simple repeats. Some of these features appear to
be enriched or depleted in certains regions. For instance,
palindromes are enriched in the regulatory regions of certain species.
Simple repeats tend to be depleted in conserved non-coding regions.
From H. sapiens (Feb 2009 GRCh37/hg19).
Notes on samples:
- CpG dinucleotides: any occurrence of the sequence CG
- W-hexamers: any match to the consensus sequence WWWWWW (W means A or T).
- S-hexamers: any match to the consensus sequence SSSSSS (S means C or G).
- R(+)/Y(-)-hexamers: any match to the consensus sequences RRRRRR if the
strand field is "+" or YYYYYY if the strand field is "-" (R means A or G
and Y means C or T). Note that R and Y are reverse complementary
dinucleotide types. Wherever there is an R on the + strand of the chromosome,
there is Y on the - strand.
- M(+)/K(-)-hexamers: any match to the consensus sequences MMMMMM if the
strand field is "+" or KKKKKK if the strand field is "-" (M means A or C
and K means G or T). Note that M and K are reverse complementary
dinucleotide types. Wherever there is an M on the + strand of the chromosome,
there is K on the - strand.
- hexa-homopolymers (aaaaaa): a run of six identical bases in a row.
- 3x2-repeats (ababab): Dinucleotide repeats of the form ababab.
a must not be identical to b.
- 2x3-repeats (abcabc): Trinucleotide repeats of the form abcabc.
The trinucleotide abc must contain at least two different bases.
- hexa-palindromes (abcxyz): Palindromes of the form abcxyz.
a must be complementary to z, b to y and c to x.
- hepta-palindromes (abcNxyz): Palindromes of the form abcNxyz.
a must be complementary to z, b to y and c to x. N means any base.
The genomic positions recorded in the SGA files correspond to the internal
position 3 of all hexameric features, and to position 4 of hepta-palindromes.
The SGA files were generated with the following Perl scripts:
available from the MGA script archive at: