We recently have been learning to use Qiime 2 to replace our Qiime 1 method. My supervisor ran his Qiime 1 OTU clustering method on the same data set that I am using Qiime 2 for comparison of methods. For the most part, these methods produce comparable results. Our samples are low biomass environmental samples taken from wells, lots of singlets and doublets etc.
My reverse reads showed very poor quality for this data set, so I opted to use only forward reads.
After importing demultiplexed sequences, I used cutadapt to trim primers, resulting in the following reads per sample:
I passed the foward reads through DADA2 using the following command:
qiime dada2 denoise-single \
--i-demultiplexed-seqs reads_qza/reads_trimmed.qza \
--p-trunc-len 285 \
--o-representative-sequences dada2_single_output/dada2_rep_seqs.qza \
--o-table dada2_single_output/dada2_table.qza \
--o-denoising-stats dada2_single_output/dada2_stats.qza
I expect these samples to have high diversity, but these are the resulting features:
I tried a method more similar to our Qiime 1 method. I dereplicate my sequences and cluster them using VSEARCH, this results in many more features.
Could this be due to the quality of the reads? It's not clear to me why DADA2 is discarding so many reads. I apologize if I've used any incorrect language in this post. I am still new to bioinformatics.