Code Review: Custom QIIME 2

We previously used the following code with the Greengenes 13.8 database. Now, we would like to use the latest version of the database. Which parts of the code need to be modified to accommodate this update?

#Activation
conda activate qiime2-amplicon-2024.10

  1. Importing data (“Fastq manifest” formats)

1.1 import

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest
--output-path 1_1_demux.qza
--input-format PairedEndFastqManifestPhred33V2

#Single end read
#--type 'SampleData[SequencesWithQuality]'
#--input-format SingleEndFastqManifestPhred33V2
#--input-format SingleEndFastqManifestPhred64V2

#Paired end read
#--type 'SampleData[PairedEndSequencesWithQuality]'
#--input-format PairedEndFastqManifestPhred33V2
#--input-format PairedEndFastqManifestPhred64V2

1.2 Joining paired end reads

qiime vsearch merge-pairs
--i-demultiplexed-seqs 1_1_demux.qza
--o-merged-sequences 1_2_demux-joined.qza
--o-unmerged-sequences 1_2_ummerged.qza

##summary
qiime demux summarize
--i-data 1_2_demux-joined.qza
--o-visualization 1_2_demux-joined.qzv

##View
qiime tools view 1_2_demux-joined.qzv #score dekhbo for denosing value setting

1.3 Quality filter

qiime quality-filter q-score
--i-demux 1_2_demux-joined.qza
--p-min-quality 20
--o-filtered-sequences 1_3_demux-joined-filtered.qza
--o-filter-stats 1_3_demux-joined-filter-stats.qza

##summary
qiime demux summarize
--i-data 1_3_demux-joined-filtered.qza
--o-visualization 1_3_demux-joined-filtered.qzv

##View
qiime tools view 1_3_demux-joined-filtered.qzv

##status filter

qiime metadata tabulate
--m-input-file 1_3_demux-joined-filter-stats.qza
--o-visualization 1_3_demux-joined-filter-stats.qzv

##View
qiime tools view 1_3_demux-joined-filter-stats.qzv

1.4 Dereplicate-sequences

qiime vsearch dereplicate-sequences
--i-sequences 1_2_demux-joined.qza
--o-dereplicated-table 1_4_table.qza
--o-dereplicated-sequences 1_4_rep-seqs.qza

1.5 De novo clustering

qiime vsearch cluster-features-de-novo
--i-table 1_4_table.qza
--i-sequences 1_4_rep-seqs.qza
--p-perc-identity 0.99
--p-threads 36
--o-clustered-table 1_5_table-dn-99.qza
--o-clustered-sequences 1_5_rep-seqs-dn-99.qza

1.6 de novo chimera checking

qiime vsearch uchime-denovo
--i-table 1_5_table-dn-99.qza
--i-sequences 1_5_rep-seqs-dn-99.qza
--output-dir 1_6_uchime-dn-out

1.7 Exclude chimeras and “borderline chimeras”

a.
qiime feature-table filter-features
--i-table 1_5_table-dn-99.qza
--m-metadata-file 1_6_uchime-dn-out/nonchimeras.qza
--o-filtered-table 1_7a_table-dn-99.qza

b.
qiime feature-table filter-seqs
--i-data 1_5_rep-seqs-dn-99.qza
--m-metadata-file 1_6_uchime-dn-out/nonchimeras.qza
--o-filtered-data 1_7b_rep-seqs-dn-99.qza

c.
qiime feature-table summarize
--i-table 1_7a_table-dn-99.qza
--o-visualization 1_7a_table-dn-99.qzv

d.
qiime tools view 1_7a_table-dn-99.qzv

2.1 Generate a tree for phylogenetic diversity analyses

qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences 1_7b_rep-seqs-dn-99.qza
--p-n-threads 34
--o-alignment 2.1_aligned-rep-seqs.qza
--o-masked-alignment 2.1_masked-aligned-rep-seqs.qza
--o-tree 2.1_unrooted-tree.qza
--o-rooted-tree 2.1_rooted-tree.qza

  1. Taxonomic assaignment

3.1 Obtaining reference data sets

[Link: QIIME]

wget ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz

[Note: unzip the file and copy 99_otus.fasta and 99_otu_taxonomy.txt files]

#######################################################

3.2 Importing reference data sets to qiime
(a)

qiime tools import
--type 'FeatureData[Sequence]'
--input-path 99_otus.fasta
--output-path 3_2a_ref-99_otus.qza

(b)

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path 99_otu_taxonomy.txt
--output-path 3_2b_ref-99_taxonomy.qza

3.3 Extract reference reads
time consuming
Forward primer = CCTACGGGNGGCWGCAG
Reverse Primer = GACTACHVGGGTATCTAATCC

qiime feature-classifier extract-reads
--i-sequences 3_2a_ref-99_otus.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--p-min-length 300
--p-max-length 500
--o-reads 3_3_ref-seqs.qza

3.4 Train the classifier

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads 3_3_ref-seqs.qza
--i-reference-taxonomy 3_2b_ref-99_taxonomy.qza
--o-classifier 3_4_classifier.qza

################################################################

3.5 Taxonomic analysis

(a) classification

qiime feature-classifier classify-sklearn
--i-classifier 3_4_classifier.qza
--i-reads 1_7b_rep-seqs-dn-99.qza
--o-classification 3_5a_taxonomy.qza

(b) summary

qiime metadata tabulate
--m-input-file 3_5a_taxonomy.qza
--o-visualization 3_5b_taxonomy.qzv

(c) view

qiime tools view 3_5b_taxonomy.qzv

3.6 Bar plot

qiime taxa barplot
--i-table 1_7a_table-dn-99.qza
--i-taxonomy 3_5a_taxonomy.qza
--m-metadata-file metadata.tsv
--o-visualization 3_6_taxa-bar-plots.qzv

3.7 Final visualization

qiime tools view 3_6_taxa-bar-plots.qzv

Hi @Sujan ,
Thanks for your question. I recommend checking out the tutorials on https://docs.qiime2.org and in the "Community Contributions/Tutorials" section of this forum, where various other database options and usage examples are shown. You can look to find the option that is right for you.

Good luck!

2 Likes

I couldn’t find tutorials for the Greengenes 2022 database. Do we only need to modify the code in sections 3.2 and 3.3, or are changes required in the earlier sections as well?

Upstream should be the same, as taxonomy database is (mostly) independent of the denoising process.

Unless you want to change the upstream pipeline because

1 Like

Hi @Sujan,

Have you worked through the Greengenes2 tutorial? Additional, information can be found here too, with ftp links to some useful files.

2 Likes