EPDnew human database statistics

This page gives further information on the human EPDnew database. A detailed description on how this release was generated can be found here.

Gene and transcript coverage

Total number of validated genes: 16455
Total number of validated transcripts: 29598

Core promoter elemnts enrichment

Core promoter element analysis is performed in order to investigate the quality of the promoter collection. It exploits the fact that certain DNA motifs preferentially occur at characteristic distances from a TSS. For instance, the TATA-box occurs in a narrow region centered about 28 bp upstream of the TSS whereas the CCAAT-box occurs in a much wider area with a peak frequency at position -80. Based on these observations, we would expect a high-quality promoter collection to show high peaks for both sequence motifs. In addition, a narrow TATA-box peak at -28 would indicate precise TSS mapping. This analysis has been performed using OProf. Readers are encouraged to repeat this anlysis and perform others in order to check for the quality of the promoter list.

TATA-box: this core promoter element is normally found 28 bp upstream the transcription start site. The following plot shows that EPDnew promoter collection has a more focused TATA-box distribution compared to ENSEMBL annotation suggesting a precise TSS mapping in EPDnew.

Initiator: it is found at the TSS and shows a great enrichment in EPDnew compared to ENSEMBL promoter collection.

CCAAT-box: is found more up-stream of the TSS compared to the other core promoter elements. EPDnew shows an enrichment in this elements as well.

GC-box: as in the other cases, EPDnew shows an enrichment in this element compared to ENSEMBL collection.

Last update October 2019