Transferring Qiita Artifacts to Qiime2

tutorial

(Stephanie Orchanian) #1

About the Transferring Qiita Artifacts to Qiime2 Tutorial

This Transferring Qiita Artifacts to Qiime2 Community Tutorial (run using QIIME 2018.6) walks you through converting your data from Qiita to .qza files so you can further your analysis on the Qiime2 platform.

This tutorial lists the Qiita commands, the file format of data created from the Qiita commands, the name of the file created on Qiita, the commands to use to import the file as a .qza file, and other .qza files you can make using Qiita outputs.

Qiita allows easy submission of your data to European Bioinformatics Institute (EBI) so we suggest uploading your data to Qiita for processing, uploading to EBI, and publicly sharing your data even if you want to do your own outside analysis.

Qiita’s amplicon sequence processing parameters defaults have been set based on what current research and benchmarks suggest are optimal for supporting meta-analyses for the highly diverse sample types represented on our platform

  1. Closed reference OTUs: Qiita employs Qiime v1.9.1 “pick_closed_reference_otus.py” which uses SortMeRNA (1) to pick OTUs against
  • 16S: the latest (Aug 2013) 97% OTU reference sequences from GreenGenes (2)

  • 18S: the latest (January 2017) 97% OTU reference sequences from UNITE 7 (3)

  • ITS: the latest (July 2014) 97% OTU reference sequences from SILVA 119 (4)

  • Closed reference picked OTUs are no longer preferred for examining community diversity metrics (5) for single-study analyses or single amplicon meta-analyses.

  • Closed reference picked OTUs are still suggested for meta-analyses of studies that examine different amplicons (e.g. 16S V1-3 vs V4-5) or for use in plugins that require a numeric OTU ID.

  1. Deblur sOTU: Qiita employs Deblur v1.0.4 on trimmed sequences (UC San Diego studies are typically processed with 90, 100, and 150 nt) with no minimum read threshold count, followed by insertion into the latest (Aug 2013) 99% OTU tree from GreenGenes (2) using SEPP (5) as part of the Qiime2 q2-fragment-insertion plugin.
  • The single read threshold was chosen (rather than the default 10 read threshold) to support the best practice of setting a custom minimum read threshold that is suitable to your single-study analysis or meta-analysis. This can be done using the Filter Table command (see below)

  • A recent paper (6) examined the suitability of deblur and other denoising algorithms for various types of studies, so we would encourage users to determine if deblurred sOTUs are suitable for their analyses.

  • Note that although the latest (Aug 2013) GreenGenes release is now nearly 5 years old, we continue to use this release rather than the more up-to-date SILVA Database . We will be updating our reference database to GreenGenes 2.0 which is targeted for release in the coming year.

Data Generated from Processing

Pick Closed Reference (for multi-amplicon analyses and backward compatibility)

  • Creates a closed reference OTU biom table: otu_table.biom
qiime tools import \
  --input-path otu_table.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path table.qza

Deblur Final Table (Use for ITS and UNITE Analyses)

  • Creates a deblurred OTU biom table: all.biom
qiime tools import \
  --input-path all.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path table.qza
  • Creates representative sequences file: all.seqs.fa (preprocessed fasta)
qiime tools import \
  --input-path all.seqs.fa \
  --type 'FeatureData[Sequence]' \
  --output-path rep-seqs.qza

Deblur Reference Hit (Use for 16S Analyses)

  • Creates a deblurred OTU biom table: reference-hit.biom
qiime tools import \
  --input-path reference-hit.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path table.qza
  • Creates representative sequences file: reference-hit-seqs.fa (preprocessed fasta)
qiime tools import \
  --input-path reference-hit-seqs.fa \
  --type 'FeatureData[Sequence]' \
  --output-path rep-seqs.qza
  • Creates a rooted tree: insertion_tree.relabelled.tre (plain text)
qiime tools import \
  --input-path insertion_tree.relabelled.tre \
  --type 'Phylogeny[Rooted]' \
  --output-path rooted-tree.qza
  • Note that this tree can also be used to filter your OTU table to contain only the features in the tree
echo "feature-id" > qiita-files/sepp-tips.txt
 
python -c "from skbio.tree import TreeNode; print('\n'.join([tip.name
for tip in
TreeNode.read('qiita-files/insertion_tree.relabelled.fix.tre').tips()
if not tip.name.isdigit()]))" >> qiita-files/sepp-tips.txt
 
qiime feature-table filter-features \
  --i-table qiita-files/feature-table.qza \
  --p-no-exclude-ids \
  --m-metadata-file qiita-files/sepp-tips.txt \
  --o-filtered-table feature-table.qza

Data Generated from Analysis

dflt_name- Closed Reference Analysis

  • Contains a closed reference OTU biom table: {analysisID}_analysis_{target gene}_{processing info}_biom_table.biom
qiime tools import \
  --input-path {analysisID}_analysis_{target gene}_{processing info}_biom_table.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path closedRef_table.qza

dflt_name- Deblurred Analysis

  • Contains a deblurred OTU biom table: {analysisID}_analysis_{target gene}_{processing info}_biom_table.biom
qiime tools import \
  --input-path {analysisID}_analysis_{target gene}_{processing info}_biom_table.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path deblurred_table.qza

Generating a Representative Sequences File from a Biom Table

  • If conducting a meta-analysis, you can use your deblurred biom table to create a representative sequences file containing all of the sequences
biom summarize-table --observations -i your_biom_table.biom \
| tail -n +16 | awk -F ':' '{print ">"$1"\n"$1}' > rep_seqs.fna
 
qiime tools import \
  --input-path rep_seqs.fna \
  --type 'FeatureData[Sequence]' \
  --output-path rep-seqs.qza

Generating A Taxonomy Classification File from a Database and a Representative Sequences File

  • The files created in Qiita and transferred to Qiime2 can be used to create a taxonomy classification file to be used in your Qiime2 analysis

  • Note that the gg-13-8-99-515-806-nb-classifier.qza file can be changed to use other databases.

qiime feature-classifier classify-sklearn \
--i-classifier gg-13-8-99-515-806-nb-classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza

Rarefy Features

  • Creates a rarefied OTU biom table: rarefied.biom
qiime tools import \
  --input-path rarefied.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path table.qza

Filter Samples by Metadata

  • Creates a filtered OTU biom table: filtered.biom
qiime tools import \
  --input-path filtered.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path table.qza

Calculate Alpha Diversity

  • Creates an alpha vector: alpha-diversity.tsv (plain text)
qiime tools import \
  --input-path alpha-diversity.tsv \
  --type 'SampleData[AlphaDiversity]' \
  --output-path alpha_vector.qza

Calculate Beta Diversity

  • Creates a beta diversity distance matrix: distance-matrix.tsv (plain text)
qiime tools import \
  --input-path distance-matrix.tsv \
  --type 'DistanceMatrix' \
  --output-path beta_diversity_matrix.qza

Principal Coordinate Analysis (PCoA)

  • Creates an ordination file: ordination.txt (plain text)
qiime tools import \
  --input-path ordination.txt \
  --type 'PCoAResults' \
  --output-path pcoa_results.qza

Note that the rest of the analysis commands create .qzv files that are viewable in Qiita. Therefore no conversions are necessary. It is not currently possible to download the visualized .qzv files though we hope to add this feature in the future.

References

  1. Kopylova, E., Noe, L., Touzet, H. (2012). “SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data”. Bioinformatics. 28 (24) 3211-7.

  2. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.

  3. Abarenkov, K., Nilsson, R. H., Larsson, K., Alexander, I. J., Eberhardt, U., Erland, S., Høiland, K., Kjøller, R., Larsson, E., Pennanen, R., Sen, R., Taylor, A. F. S., Tedersoo, L., Ursing, B. M., Vrålstad, T., Liimatainen, K., Peintner, U., Kõljalg, U. (2010). “The UNITE database for molecular identification of fungi - recent updates and future perspectives”. New Phytologist. 186(2): 281-285.

  4. Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glöckner, F. O. (2013). “The SILVA ribosomal RNA gene database project: improved data processing and web-based tools”. Nucl. Acids Res. 41 (D1): D590-D596.

  5. Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information. Stefan Janssen, Daniel McDonald, Antonio Gonzalez, Jose A. Navas-Molina, Lingjing Jiang, Zhenjiang Zech Xu, Kevin Winker, Deborah M. Kado, Eric Orwoll, Mark Manary, Siavash Mirarab, Rob Knight. mSystems 2018.

  6. Amir, A., McDonald, D., Navas-Molina, J.A., Kopylova, E., Morton, J., Xu, Z.Z., Kightley, E.P., Thompson, L.R., Hyde, E.R., Gonzalez, A., Knight, R. (2017) “Deblur rapidly resolves single-nucleotide community sequence patterns.” mSystems. 2 (2) e00191-16.


Fragment insertion tree - tips dont correspont to OTU IDs