Poor taxonomic assignement

michele_quail · January 30, 2019, 4:42pm

Good evening everyone,

I am starting to work with Qiime2 in these days and untill now I found it is pretty easy to use but I am getting a particular problem. Starting from fastq I have imported them and performed dada2 analyses and then the taxonomic assignation using greengenes dataset. Then, trying to check how many taxa have been recognized, I did a barplot qzv to see it on qiime2 view web page and I discovered two errors: 1) some subjects did not have even 1 taxa ; 2) other samples have very few taxa recognized. On the same dataset I performed qiime1 closed reference and I get a completely different results with many species recognized. Thus, I think I did some mistake in my qiime 2 analyses but i dont understand in which point. Can someone help me to understand wher I am wrong? I am working with MiSeq V3-V4 data and this is my code:

source activate qiime2-2018.11


##Preparare i Raw Data
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path Run6/Raw \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path Run6/demux-paired-end.qza
  
qiime demux summarize \
--i-data Run6/demux-paired-end.qza \
--o-visualization Run6/demux.qzv

qiime dada2 denoise-paired \
--i-demultiplexed-seqs Run6/demux-paired-end.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 220 \
--p-trunc-len-r 200 \
--o-representative-sequences Run6/rep-seqs.qza \
--o-table Run6/table.qza  \
--output-dir Dada2Out2    #crea nuova directory


#Taxonomy with Greengenes
wget -O "gg-13-8-99-515-806-nb-classifier.qza" "https://data.qiime2.org/2018.2/common/gg-13-8-99-515-806-nb-classifier.qza"

qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads Run5/rep-seqs.qza \
  --o-classification Run5/taxonomy.qza

qiime metadata tabulate \
  --m-input-file  Run5/taxonomy.qza \
  --o-visualization  Run5/taxonomy.qzv

#barplot taxa
qiime taxa barplot \
  --i-table Run5/table.qza \
  --i-taxonomy Run5/taxonomy.qza \
  --m-metadata-file Run5/MappingfileRun5.txt \
  --o-visualization Run5/taxa-bar-plots.qzv

Nicholas_Bokulich · January 30, 2019, 5:40pm

Hi @michele_quail,
Poor taxonomic classification is almost always a simple user error: the wrong classifier is used for the query sequences. In your case I think I know the problem:

By any chance did you try to classify with one of the pre-trained V4 classifiers? This does not cover V3-V4 and so will perform poorly. You will need to use either the full-length 16S rRNA gene pre-trained classifiers or train your own classifier.

If you are sure you used an appropriate classifier (e.g., full-length 16S), then you probably have a lot of non-target DNA (e.g., host DNA) in your sequences. You can use NCBI BLAST of an unclassified sequence to confirm that's the issue.

Good luck!

michele_quail · January 30, 2019, 5:50pm

Hello @Nicholas_Bokulich,

thanks a lot for you help!!
As you can see I have used "gg-13-8-99-515-806-nb-classifier.qza" cause I found it on a tutorial online.
Where can I get a better classifier as the one you were suggesting me?
Can you please give me more indication about how to get it?
Thanks a lot

Nicholas_Bokulich · January 30, 2019, 5:56pm

I see that now — that's the problem!

We host several pre-trained classifiers here. You can download the full-length classifier there.

michele_quail · February 1, 2019, 2:56pm

Hi @Nicholas_Bokulich,
I have made a try with Greengenes 13_8 99% OTUs full-length sequences

but I get more or less the same result to be truth...is it the right database?
Still thank you for your help

Nicholas_Bokulich · February 1, 2019, 3:08pm

Hi @michele_quail,
The full-length sequences should work for V3-V4... perhaps you should share your barplot.qzv here so I can see what you mean but more or less the same.

You could also try training your own V3-V4 classifier — it is time consuming but tailored to your specific amplicon so should give the best results.

michele_quail · February 1, 2019, 3:30pm

Of course I can share it.

Nicholas_Bokulich · February 1, 2019, 3:45pm

Hi @michele_quail,
Those white spaces are not due to poor classification at all. Those samples evidently do not contain any sequences, and hence have no taxa or relative abundances. Those samples probably had low sequence counts, and all sequences were filtered out at some earlier step. You can use qiime feature-table summarize on your feature table to confirm — but at the end of the day this does not look like a problem with classification.

michele_quail · February 1, 2019, 4:09pm

OK thanks a lot @Nicholas_Bokulich for your precious support

system · March 4, 2019, 10:09pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.