qiime version: qiime2-2022.11
conda environment
Hi everyone,
Starting from the sequences obtained from an Illumina MiSeq 300x2 paired end run, I performed an analysis on eDNA using qiime2 and the 16s primer for invertebrates. I also performed the same analysis using another software (OBItools3) on the same dataset.
What I noticed is that:
- the results from qiime2 are more "abundant" than those from obi, meaning I am able to obtain a greater number of identified OTUs.
- the results from obitools, in terms of sequence quantity, are one-fifth of those from qiime2.
However, a problem emerged: a key OTU in my dataset (Chamelea gallina, which was present in the obi results) never appears in the qiime analysis. This is strange because I am certain that the sequence of that species is present in my dataset, and it is also quite abundant.
To verify this, I extracted the unassigned sequences from my dataset after denoising with dada2:
qiime taxa filter-seqs
--i-sequences rep-seqs.qza
--i-taxonomy Vsearch-taxonomy.qza
--p-include "Unassigned"
--o-filtered-sequences unassigned_seqs.qza
I then blasted (blastn 2.13.0+) a Chamelea query obtained from the obi pipeline using the unassigned sequences as a reference database. The result is a 100% match with no gaps and multiple matches. Furthermore, I verified that the sequence of the OTU was present in the reference database used for taxonomic identification.
Starting from the denoised sequences, my commands were:
qiime dada2 denoise-paired
--i-demultiplexed-seqs seqs.qza
--p-trim-left-f 33
--p-trim-left-r 33
--p-trunc-len-f 150
--p-trunc-len-r 150
--p-max-ee-f 2.0
--p-max-ee-r 2.0
--p-chimera-method 'consensus'
--p-n-reads-learn 1000000
--p-n-threads 7
--o-table table.qza
--o-representative-sequences rep-seqs-150f-150r-miseq.qza
--o-denoising-stats denoising-stats-miseq.qza
--verbose
The value of '--p-trunc-len' is set to 150 to compare this dataset with another Illumina 150x2 paired-end dataset. In any case, the sequences of interest fall within the 300bp range. I performed this step on a subset by setting all possible values, and 150 should not qualitatively influence the results.
qiime feature-classifier classify-consensus-vsearch
--i-query rep-seqs-150f-150r-miseq.qza
--i-reference-reads ncbi-16s-derep-extracted-14-03-2023.qza
--i-reference-taxonomy ncbi-16s-taxa-derep-14-03-2023.qza
--p-perc-identity 0.97
--p-threads 7
--o-classification taxonomy-vsearch-150f-150r-miseq.qza
--o-search-results vsearch_tophits-150f-150r-miseq.qza
--verbose
I cannot figure out how my classify-consensus-vsearch is missing this OTU.
Additionally, I hypothesized that one of the most abundant OTUs in the dataset could somehow be confused with Chamelea, leading to misidentification, but the two species only match at 60-70%.Is there something stupid that I'm missing?
I'm sorry for my long message, but there could be some useful details.
Thank you for any help.