Projects and runs¶
The projects and runs webpages can be found from the Data menu, which contains two submenus: one for all projects and runs, the other for curated projects.
There are four types of book-open webpages for projects and runs.
All projects and runs¶
This page provides statistics on the data collected in GMrepo, and lists all projects and runs in two tables. From these tables, users can access webpages with details of specific projects or runs.
It consists of three parts:
1. Overview¶
This part provides an overview of data collected in our database. For example, the current release contains:
- Meta data for a total of 118,865 runs (samples), belonging to 890 projects. Among these, raw data have been processed for 108,177 runs (samples), belonging to 884 projects.
- Microbe abundance data are available for 68,723 runs (samples), belonging to 664 projects.
- A total of 39,452 runs (samples), belonging to 472 projects failed our QC processes.
- In addition, GMrepo includes information on 6 projects whose raw data were not processed, mostly due to the lack of clearly defined health/disease information.
2. All projects table¶
This table lists collected projects, their associated diseases, related publications, brief descriptions, and whether the raw data have been processed.
Raw data of a project will not be processed if essential meta-data are missing.
3. All runs table¶
This table contains a list of collected samples in GMrepo, including meta-data such as:
- Technical meta-data:
- Experiment type (16S or Metagenomics),
- Sequencing devices/instruments,
-
Number of obtained sequencing reads.
-
Host-related, biologically relevant meta-data:
- Disease or health status (referred to as
phenotypein our database), - Age,
- Sex,
- BMI (body mass index),
- Antibiotic usage.
Meta data are available for a total of 118,865 runs/samples.
Each run has a QCStatus (quality control result), which can be:
- 1: data passed QC and processed results have been loaded,
- 0: data did NOT pass QC,
- (empty): data yet to be processed.
Please consult Data processing & QC for more details.
Project details page¶
The project details page provides information about each collected project, including:
- Included samples,
- Associated disease(s),
- Related publication(s),
- Disease marker analysis results.
Here are some examples:
1. Project overview¶
This part provides basic information about a project, including:
- A brief project description obtained from public databases, mostly from
- ENA (European Nucleotide Archive)
- NCBI SRA (Sequence Read Archive)
- Number of included runs,
- Related publication(s), if available.
2. Associated runs/samples¶
This section includes a table of runs/samples with meta-data and QCStatus, similar to the all runs table.
3. In-depth analysis¶
In-depth analysis currently includes only marker identification. See our documentation on Disease marker identification for more.
Briefly, microbial markers showing significantly different abundance between:
- Disease and healthy controls (e.g., CRC and healthy),
- Or between disease stages (e.g., adenoma and CRC),
are identified using LEfSe (Linear discriminant analysis effect size)
PMID: 21702898.
Markers are identified per project and included in the project details page.
LEfSe results are shown in tables and visualized by barplots, e.g.:

This plot shows marker species with |LDA score| >= 2.0 between CRC and healthy controls in project PRJEB46665, where green bars indicate health-enriched species and pink bars indicate CRC-enriched species.
Note:
* For whole-genome shotgun sequencing (mNGS) projects, markers are identified at both species and genus levels.
* For 16S amplicon data, markers are identified only at genus level.
Run details page¶
This page shows details of a specific run, including:
1. Run details and meta-data¶
- Run ID,
- Brief introduction,
- Related project ID and sample ID,
- Associated disease,
- Meta-data.
2. Taxonomic profile¶
This part shows the bacterial species/genus identified and their relative abundances.
See ERR6617404 for example.
The relative abundances are visualized as below:

Users can download the detailed profile as a text file using the link below the figures.