Sequence quality


I have some questions about sequence quality. In my case, demux.qzv from paired-end read data is shown below.

It seems to me that QIIME2 tries to get a random sampling of 10000 sequences. I noticed that a previous discussion on QIIME2 quality score suggested a cut-off value of 20 for score. For my data, the amplicon length is about 250 nts. I noticed from the Interactive Quality Plot that number of sequences used drops dramatically from 10000 to only around 1000 when the length is greater than 240.

Question 1: For the down stream analysis by denoting with DADA2, is it reasonable to set length at 240 nts and use the following command (no trimming of the left side)? Should I also truncate the left side, e.g. -p-trim-left-f 10 according to the plot?

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 240 --p-trunc-len-r 240 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza

The demultiplexed sequence counts summary is shown below.

Question 2: For "qiime diversity core-metrics-phylogenetic" command group (qiime diversity core-metrics-phylogenetic --i-phylogeny rooted-tree.qza --i-table table.qza --p-sampling-depth ? --m-metadata-file MetadataT.txt --output-dir core-metrics-results) , what number is proper as the parameter for sampling depth " --p-sampling-depth". Base on the second plot attached, should I use the minimum sequence account, i.e. 141860?

Question 3: For the "qiime diversity alpha-rarefaction" command group (qiime diversity alpha-rarefaction --i-table table.qza --i-phylogeny rooted-tree.qza --p-max-depth ? --m-metadata-file MetadataT.txt --o-visualization alpha-rarefaction.qzv), what number is proper as the parameter for max depth " --p-max-depth". Base on the second plot attached, should I use a number between the minimum sequence account, i.e. 141860, and median, i.e. 178898?

Thanks a lot in advance!

Hello James,

Lot’s of good questions here. Let’s dive in!

Sure, that should work. I like the idea of using -p-trim-left-f 10. You could try it both ways and see how they compare.

Yes, use that minimum value of 141860. That’s very good sequencing depth, by the way.

You could use min of 141860 and max of 178898, but I would suggest a different strategy. I would suggest using min of 41860 max of 141860. This means that no samples will be dropped from this analysis, and you can see if those smallest samples are reaching an alpha diversity asymptote, or if they are still raising by 141k reads per sample.

I hope this helps! Let me know if you have more questions,

1 Like

Thank you very much Colin!


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.