Data downloads¶

Download processed data from GMrepo¶

File	Description
Projects	A list of projects and related information such as project description.
Projects summary	A list of projects and information such as total numbers of associated runs, processed runs, valid runs and failed runs.
All runs	Information on runs/samples collected in this database, including their associated projects and other curated meta-data.
Processed runs	A list of processed runs, tools used for analysis, and their QC statuses.
Relative abundances	Relative abundances at species and genus levels in samples/runs that passed our QC criteria.
Relative abundance summary	Summary statistics on the taxonomic abundance data.
Taxon co-occurrence data	Taxon co-occurrence in runs/samples of each phenotype, calculated separately for species and genus.
Statistics by phenotype	Statistics by phenotype, including number of runs (with meta-data), processed/valid/failed runs, and number of associated species and genera.
MySQL dump of the whole database	MySQL dump of all tables in the database.
MeSH table	Medical Subject Headings (MeSH) data used in this study.
NCBI taxonomy table	Reformatted NCBI taxonomy: taxon ID to scientific name and rank.

Note

Please note that in the NCBI taxonomy database, two types of taxonomy IDs are used:

taxon_id: Internal unique ID used by NCBI taxonomy
ncbi_taxon_id: The actual NCBI taxonomy ID of a taxonomy entity

Download raw sequence data¶

Due to limited hardware capacity, we do not offer raw sequence data downloads directly from our database.

Instead, users should download raw sequence reads from public databases such as the SRA (Sequence Read Archive) at NCBI (National Center for Biotechnology Information).

To do so:

Copy & paste the run ID of interest into the "Search" box on the SRA website
Visit the corresponding run page, and use download links provided
Or use the "linkout" icon (available for each run ID in GMrepo) to go directly to the corresponding SRA run page

Alternatively, use command-line tools from the SRA Toolkit to download raw data in various formats. Common tools include:

fastq-dump: download SRA data to a local directory. Usage:

fastq-dump [options] <run_accession_id>

prefetch: download SRA, dbGaP and ADSP data. Usage:

prefetch [options] <run_accession_id>

For more details, consult the SRA Toolkit documentation.