@Jen_S and I were able to successfully run vsearch for taxonomic classification in our pipeline aftering running DADA2-pyro option since we are using Ion Torrent data. When we examined the results using the taxa barplot, one of the regions looked decent (not great) and one barely classified anything. Since we are using mock samples, we are able to compare expected vs. our results. See below:
V2 region (green is unassigned)
V4 region (much lower percentage of unassigned but staph genus level taxa and hugely overrepresented)
This was concerning, so we worked backwards to see where the problem was. We had no issues importing the files as a qiime artifact and then cutadapt was performed for to remove adaptors and separate by V region.
We used DADA2_pyro and when we ran the denoising stats, we realized this is where our problem lied. For the V2 region (many unclassified taxa). Less than 1% of input passed filter. See example below (this is for only 2 samples from 1 run but other runs for v2 looked similar):
Our better performing region still only had about a 20-30% pass rate (see below):
Comparing this to the results from the Parkinson’s tutorial, it is clear why we have such discrepancy with our taxonomic classification.
We also compared the results from using qiime dada2 denoise-pyro and qiime dada2 denoise-single just out of curiosity and the denosing stats were the same.
Below is our dada2 script-
qiime dada2 denoise-pyro
Do you have any suggestions of changes we can make to improve what passes through the filter?
For our V2 region since the input passing is so low and we would have to improve it by >95% to match with the Parkinson’s denoising stats, can we trust the data from this region?