Clarification on Sequence counts

jwdebelius · January 7, 2019, 12:51pm

There's a lot here, so hopefully I can address at least some of it.

steffi:

After quality filtering, we lost more than 50% of sequence counts. This may be due to thw trimming parameter (not enough reads to merge).
Hence we tried with Forwards reads alone.
qiime dada2 denoise-single \
 --i-demultiplexed-seqs single-end-demux.qza \
 --p-trim-left 0 --p-trunc-len 280 \
 --o-representative-sequences rep-seqs.qza \ 
 --o-table table.qza \
 --o-denoising-stats stats-dada2.qza \
 --p-n-threads 4
In this , we could retain nearly 40% of the reads. I have attached table.qzv for your reference. I have also performed rarefaction curve

Okay, so, first, how much do your sequence counts differ between the paired and single end picking? Because it sounds like you may have a quality problem overall, rather than simply an issue with the read joining. I would recommend considering your filtering parameters, as well as the read length and overlap for each region.

Okay, next, you're dealing with multiple hypervariable regions. Do your counts take this into effect? So, are there 30,000 sequences when you combine the V1-3, V4-6, and V7-9 from a single sample, or are they seperated by region?

The hypervariable region for sequencing has a big effect on a lot of aspects of data, larger than a lot of biological effects you're likely to see when dealing with a single body site on humans, as an example.

My recommendation would be to separate your table by the hypervariable region, and then perform parallel analyses targeting what you're interested in. However, if you want to combine the data, you need to use closed reference picking, rather than a de-novo technique.

Without this information, it's hard to answer the additional questions.

Best,
Justine