Hi everyone,
I usually use qiime (version 2018.2) to do the bacterial taxonomic profiling of my samples based on 16S rRNA sequence. I use the silva database and never had problems.
Now, i have the sequences of the 23s rRNA of my samples and I am trying to do the microalgae taxonomic profiling of my samples. I tried to import the microgreen database but I have everything unassigned. I tried to understand what's wrong, i run dada again with more relaxed truncation and max errors, but still got everything unassigned except for some cyanobacteria in one of the samples. Am I using an appropriate database? I tried to import silva 138 LSU but it didn't work. I got the error: Plugin error from feature-classifier: not enough values to unpack (expected 2, got 0).
I thank everyone for the help.
Regards,
Sara
Hi @SaraRibeirinho_Soare,
Can you please provide the full commands you used, as well as links to the datatbase you're trying to import?
As for the SILVA LSU... I'd suggest installing the latest version of QIIME 2 (2024.10). There, you can use RESCRIPt (now part of the QIIME 2 install), and follow the SILVA RESCRIPt tutorial to import the latest SILVA (138.2) LSU database. You can modify the database you download like this:
qiime rescript get-silva-data \
--p-target 'LSURef_NR99' \
--output-dir SILVA_LSU_138.2
-Cheers!
Thank you so much for your help.
I did:
qiime tools import --type FeatureData[Sequence] --input-path /home/e009-hpz6g4/Desktop/ugreen2/microgreen_id.fasta --output-path /home/e009-hpz6g4/Desktop/ugreen2/23s_refseq.qza
qiime tools import --type FeatureData[Taxonomy] --input-path /home/e009-hpz6g4/Desktop/ugreen2/microgreen_algaebase.tax --source-format HeaderlessTSVTaxonomyFormat --output-path /home/e009-hpz6g4/Desktop/ugreen2/23S_taxonomy.qza
qiime feature-classifier extract-reads --i-sequences /home/e009-hpz6g4/Desktop/ugreen2/23s_refseq.qza --p-f-primer GGACAGAAAGACCCTATGAA --p-r-primer TCAGCCTGTTATCCCTAGAG --o-reads /home/e009-hpz6g4/Desktop/ugreen2/ref-seqs23s.qza
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads /home/e009-hpz6g4/Desktop/ugreen2/ref-seqs23s.qza --i-reference-taxonomy /home/e009-hpz6g4/Desktop/ugreen2/23S_taxonomy.qza --o-classifier /home/e009-hpz6g4/Desktop/ugreen2/classifier.qza
qiime feature-classifier classify-sklearn --i-classifier /home/e009-hpz6g4/Desktop/ugreen2/classifier.qza --i-reads /home/e009-hpz6g4/Desktop/EVA/23S/cutadapt_adpters/seqs-dada2.qza --o-classification /home/e009-hpz6g4/Desktop/EVA/23S/subsamples_taxonomy.qza
qiime metadata tabulate --m-input-file /home/e009-hpz6g4/Desktop/EVA/23S/subsamples_taxonomy.qza --o-visualization /home/e009-hpz6g4/Desktop/EVA/23S/subsamples_taxonomy.qzv
qiime taxa barplot --i-table /home/e009-hpz6g4/Desktop/EVA/23S/cutadapt_adpters/table-dada2.qza --i-taxonomy /home/e009-hpz6g4/Desktop/EVA/23S/subsamples_taxonomy.qza --m-metadata-file /home/e009-hpz6g4/Desktop/EVA/metadataEVA.tsv --output-dir /home/e009-hpz6g4/Desktop/EVA/23S/subsamples_taxon_outputs
the database is this: http://microgreen-23sdatabase.ea.inra.fr/
The database was successfully imported.. i am trying to run dada2 again now without chimeras removal because chatgpt says DADA2’s chimera detection is optimized for bacterial 16S... But i don't think this is the problem I get everything unassigned since I still have near 20 000 reads per sample.
Thank you so much and best regards
Hi @SaraRibeirinho_Soare,
I was able to replicate what you've done to import the microgreens data and build a 23S classifier. So, I think you did everything correctly.
Not necessarily. DADA2, and other tools, perform de novo chimera detection. You can use any marker gene you'd like. To better optimize chimera detection and removal checkout this post.
The only thing I can think of, is that your reads might be in a mixed or reverse orientation. That is qiime feature-classifier fit-classifier-naive-bayes
requires your data be in the same orientation as the classifier. You could run:
qiime rescript orient-seqs \
--i-sequences your-denoised-sequence-data.qza \
--o-oriented-seqs oriented-seqs.qza \
--o-unmatched-seqs unoriented-seqs.qza
Then run the classifier on the oriented-seqs.qza
output. Or... you can do the inverse... reverse the microgreens amplicon region reference sequences instead, and then train the classifier, then classify your reads.
qiime rescript orient-seqs \
--i-sequences microgreen_id_seqs_p23SrV_f1r1.qza \
--o-oriented-seqs mg-oriented-seqs.qza \
--o-unmatched-seqs mg-unoriented-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads mg-oriented-seqs.qza \
--i-reference-taxonomy microgreen_algaebase.qza \
--o-classifier microgreen_id_seqs_p23SrV_f1r1_oriented_classifier.qza
qiime feature-classifier classify-sklearn \
--i-classifier microgreen_id_seqs_p23SrV_f1r1_oriented_classifier.qza \
...
Another quick test could be to make use of qiime feature-classifier classify-consensus-vsearch ...
as this does not care about read orientation. Although you might obtain a good taxonomy, the reads will likely be in mixed orientation and be incorrect for phylogey based diversity analyses.
Thank you so much for your help. Unfortunately, there is no comand qiime rescript in qiime 2018.2 .. Do you know any alternative for this qiime version?
Hi @SaraRibeirinho_Soare ,
The 2018.2 release is now 7 years old. So there are many more recent updates and plugins that are missing. So I suggest that you update to the latest version, so that you have access to the latest version as well as RESCRIPt and other newer plugins. The solution suggested by @SoilRotifer will indeed not work unless if you can manage to install RESCRIPt in such an old version of QIIME 2 (it will most likely not be compatible with such an outdated version).
1 Like