Data downloads¶
Download processed data from GMrepo¶
| File | Description |
|---|---|
| Projects | A list of projects and related information such as project description. |
| Projects summary | A list of projects and information such as total numbers of associated runs, processed runs, valid runs and failed runs. |
| All runs | Information on runs/samples collected in this database, including their associated projects and other curated meta-data. |
| Processed runs | A list of processed runs, tools used for analysis, and their QC statuses. |
| Relative abundances | Relative abundances at species and genus levels in samples/runs that passed our QC criteria. |
| Relative abundance summary | Summary statistics on the taxonomic abundance data. |
| Taxon co-occurrence data | Taxon co-occurrence in runs/samples of each phenotype, calculated separately for species and genus. |
| Statistics by phenotype | Statistics by phenotype, including number of runs (with meta-data), processed/valid/failed runs, and number of associated species and genera. |
| MySQL dump of the whole database | MySQL dump of all tables in the database. |
| MeSH table | Medical Subject Headings (MeSH) data used in this study. |
| NCBI taxonomy table | Reformatted NCBI taxonomy: taxon ID to scientific name and rank. |
Note
Please note that in the NCBI taxonomy database, two types of taxonomy IDs are used:
taxon_id: Internal unique ID used by NCBI taxonomyncbi_taxon_id: The actual NCBI taxonomy ID of a taxonomy entity
Download raw sequence data¶
Due to limited hardware capacity, we do not offer raw sequence data downloads directly from our database.
Instead, users should download raw sequence reads from public databases such as the SRA (Sequence Read Archive) at NCBI (National Center for Biotechnology Information).
To do so:
- Copy & paste the run ID of interest into the "Search" box on the SRA website
- Visit the corresponding run page, and use download links provided
- Or use the "linkout" icon (available for each run ID in GMrepo) to go directly to the corresponding SRA run page
Alternatively, use command-line tools from the SRA Toolkit to download raw data in various formats. Common tools include:
fastq-dump: download SRA data to a local directory. Usage:
fastq-dump [options] <run_accession_id>
prefetch: download SRA, dbGaP and ADSP data. Usage:
prefetch [options] <run_accession_id>
For more details, consult the SRA Toolkit documentation.