This is more a question about troubleshooting and wondering where my pipeline goes wrong. I'm hoping the responses may help me in future projects where this inevitably happens again.
I'm working on the raw data sequenced by a commercial metabarcoding sequencing/analysis firm in 2021. Sequencing was done on a MiSeq with 250 PE sequencing. They ran their analyses on this data and I am attempting to replicate their results.
They used QIIME v1.9.1 to filter, remove primers, and merge PE reads. They then used UPARSE to cluster OTUs at 97%, and remove chimeras.
I used QIIME v2022.2 to import the PE data, then DADA2 to denoise, demultiplex, trim adapters/primers, remove chimeras, and identify features. Following the moving pictures tutorial, I used
qiime feature-table summarize on the DAD2
table.qza to look at the features and see where to rarefy the samples.
The sequencing company found OTUs in most samples, and found 1698 unique OTUs. The pe-table-dada.qzv (400.6 KB) from DADA2 identified 26 features, with most samples not having any identifiable features.
I used the default filtering parameters on DADA2, and the methods above are as detailed of methods as I have from the sequencing company. Did I over filter my data, or as you mention in the moving pictures tutorial, might this be a difference between QIIME1 and QIIME2, meaning my dataset is more trustworthy?
It seems like using
qiime vsearch cluster-features-de-novo wouldn't produce MORE OTUs than features identified. But might the difference be that I'm comparing features in my dataset to their OTUs?
What else might produce such stark differences in results?
Thanks for your help.