News: |
The Mass Genome Annotation (MGA) Data Repository stores published next generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) that, in conjunction with the ChIP-Seq and SSA servers, can be accessed and studied by scientists. The main characteristic of the MGA database is to store mapped data (in the form of genomic coordinates of mapped reads) and not sequence files. In this way, each sample present in the database has been pre-processed (for example sequence reads has been mapped to a genome) and presented in a standardized text format named SGA (Simple Genome Annotation).
How to cite: R. Dreos, G. Ambrosini, R. Groux, R. Cavin Perier, P. Bucher; MGA repository: a curated data resource for ChIP-seq and other genome annotated data, Nucleic Acids Research, gkx995, https://doi.org/10.1093/nar/gkx995
Access to the database can be done in various ways:
The native file format at the back end of the repository is SGA and can be accessed via the FTP server. Users interested in using MGA data with other tools that do not support SGA format can easly convert SGA formatted data to BED by:
The MGA repository contains the following numebr of samples (stratified by organism and data type):
Data Type
|
Human
|
Mouse
|
Rat
|
Rhesus Macaque
|
Dog
|
Chicken
|
Zebra fish
|
Bee
|
Fruit Fly
|
Water Flea
|
Worm
|
Baker's Yeast
|
Fission Yeast
|
Arabidopsis
|
Corn
|
Malaria Parasite
|
Total
|
ChIP-seq
|
8248
|
758
|
4
|
5
|
11
|
14
|
34
|
-
|
514
|
18
|
198
|
527
|
405
|
212
|
12
|
52
|
11012
|
ChIP-seq-invitro
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
931
|
-
|
-
|
-
|
931
|
ChIP-seq-peak
|
8206
|
28
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
8234
|
Transcription Profiling
|
2431
|
1352
|
13
|
15
|
12
|
33
|
12
|
16
|
371
|
11
|
19
|
22
|
16
|
13
|
8
|
13
|
4357
|
DNase FAIRE etc.
|
1434
|
42
|
-
|
-
|
-
|
-
|
4
|
-
|
68
|
-
|
6
|
58
|
8
|
9
|
3
|
12
|
1644
|
DNA methylation
|
24
|
4
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
28
|
Genome annotation
|
32
|
23
|
2
|
2
|
2
|
15
|
6
|
2
|
16
|
4
|
18
|
4
|
5
|
5
|
3
|
3
|
179
|
Sequence-derived
|
3617
|
2315
|
-
|
-
|
-
|
1
|
14
|
9
|
1240
|
-
|
9
|
9
|
9
|
1531
|
9
|
-
|
8764
|
Total # of Samples
|
27051
|
4535
|
19
|
22
|
25
|
63
|
70
|
27
|
2209
|
33
|
250
|
620
|
443
|
2701
|
35
|
15
|
38185
|
Data types are the following:
The list of series present in the database can be found in the MGA Data Overview page.
Samples names in MGA contain useful informations about the samples' biological and technical variables. For example, the sample '* S2|PolII|80mMsalt|contol' contains several informations that can be summarised in the figure below: