Low diversity... but maybe too low?

Hi all! I am new in this field of bioinformatics, so I am trying to create a pipeline for our runs and comparing to qiime1 pipeline (created by another student in the lab).

My qiime2 pipeline detect just a few species in my samples while qiime1 detects a lot of different species.

I have read this thread and I think it is related : Samples with low feature counts

But I would like to give you an example:

Sample1 in qiime1 at level 2: 73% Firmicutes, 21% actinobacteria
Sample1 in qiime2 at level 2: 38% Firmicutes; 62% bacteria (could not define better)

Sample1 in qiime1 at level 7: Streptococcus 49.3% ; Granulicatella 13%; Atopobium 5%; Actinomyces 8.5% ; Rothia 7,5% (and others at low %)
Sample1 in qiime2 at level 7: 62% bacteria; Streptococcus 24.5%; Bacillus 13.%

So, I am not getting for example Actinobacteria in qiime2.

I used Silva : SILVA_128_QIIME_release/rep_set/rep_set_16S_only/99/99_otus_16S.fasta and consensus_taxonomy_7_levels.txt for training:

qiime  tools import --type FeatureData[Sequence]  --input-path 99_otus_16S.fasta --output-path  99_otus_16S.qza

qiime tools import --type 'FeatureData[Taxonomy]' --source-format HeaderlessTSVTaxonomyFormat --input-path consensus_taxonomy_7_levels.txt --output-path ref-taxonomy.qza

qiime feature-classifier extract-reads --i-sequences 99_otus_16S.qza --p-f-primer CCTACGGGRSGCAGCAG --p-r-primer GGACTACHVGGGTWTCTAAT --p-trunc-len 300 --o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza

This may be a very noobie question but I would really appreciate some guidance here.

Maybe I got some misunderstanding in using my V3 primers in extract-reads ?

Thank you!

Hi! I am starting working with microbiology and I am very newbie at this matter.

I am comparing a qiime1 pipeline with my new qiime2 pipeline.

When I compare sample1 with qiime 1 I can classify several different organisms. But in qiime2 I have 50% of my sequences classified at the the Domain level only.

I am suspecting I am training the classifier wrong?

qiime  tools import --type FeatureData[Sequence]  --input-path 99_otus_16S.fasta --output-path  99_otus_16S_silva.qza

qiime tools import --type 'FeatureData[Taxonomy]' --source-format HeaderlessTSVTaxonomyFormat --input-path consensus_taxonomy_7_levels.txt --output-path ref-taxonomy_silva.qza

## F: 5'-CCTACGGGRSGCAGCAG-3'
#R: 5'-GGACTACHVGGGTWTCTAAT-3'
#rev-comp Rl: ATTAGAWACCCBDGTAGTCC
qiime feature-classifier extract-reads --i-sequences 99_otus_16S_silva.qza --p-f-primer CCTACGGGRSGCAGCAG --p-r-primer ATTAGAWACCCBDGTAGTCC --p-trunc-len 300 --o-reads ref-seqs_silva.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs_silva.qza --i-reference-taxonomy ref-taxonomy_silva.qza --o-classifier classifier_silva.qza

I would really appreciate some guidance solving this issue.

thanks!

Hi @borgesrodrigo! I merged your other topic with this one since they’re similar questions. There’s several differences in the processing steps taken by QIIME 1 vs QIIME 2 to get from raw sequences to taxonomic classifications, so there’s a few variables you’ll need to control for when comparing results:

  • Quality filtering: QIIME 1 uses PHRED-score based filtering and trimming, while QIIME 2 uses DADA2 to denoise the data.
  • Clustering/feature selection: QIIME 1 performs OTU picking with uclust by default (closed-reference, open-reference, or de novo), while QIIME 2 uses DADA2 or Deblur to find sequence variants.
  • Taxonomy assignment: QIIME 1 uses a uclust consensus assigner by default to perform taxonomic classification, while QIIME 2 has a scikit-learn naive Bayes classifier, along with consensus vsearch and blast classifiers.

If you want to just compare differences between the taxonomy classifiers, you could generate your representative sequences using a QIIME 1 OTU picking script, perform taxonomy assignment on them with assign_taxonomy.py, and then import the representative sequences into QIIME 2 and perform taxonomy assignment on them. You could then compare the results between the two classifiers.

You’ll need to make sure you’re performing taxonomy assignment using the same representative sequences and reference database files. Since you’re using the naive Bayes classifier in QIIME 2, I recommend performing taxonomy assignment with the RDP classifier in QIIME 1, since those methods are similar. QIIME 1 by default doesn’t use the RDP classifier so you’ll need to configure that in your analyses.

You could export the sequence data from the output of qiime feature-classifier extract-reads (i.e. ref-seqs_silva.qza) and see how many sequences were successfully extracted. You can check that their length makes sense and compare how many extracted reads you have to the number of full-length reference sequences.

You might also try performing taxonomy assignment using the full-length reference sequences (i.e. skipping qiime feature-classifier extract-reads). That will be the fairest comparison to QIIME 1’s invocation of the RDP classifier.

Let us know what you find, and good luck!

1 Like

Thank you @jairideout ! I thought that something went wrong with my initial post so I have written the second one. Sorry for the trouble.

Your answers are everything I need (I believe) for now. thank you so much for your time. You have guided me also to filter what I sould read (I have a lot of papers to read about everything related to microbiology).

I will take some time to do all of this, but I will come back here and register the advances.

Cheers!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.