Interpretation of results after taxonomy analysis from decontaminated table

joaomiranda · November 15, 2021, 9:37pm

Thank you so much for your comments. Below, I'll summarize what I did according to your guidelines:

I've received demultiplexed sequences to initiate my analysis. To make sure that all non-biological sequences were removed, I've runned cutadapt. Analysing the resulting quality plots, I've noticed some differences in Qscore that changed my settings for trim and trunc in dada2.

Here are my code for cutadapt

qiime cutadapt trim-single \
 --i-demultiplexed-sequences single-end-demux.qza \
 --p-front CCTACGGGNGGCWGCAG \
 --o-trimmed-sequences trimmed-seq-single.qza \
 --p-error-rate 0 \
 --verbose

the visualization file: trimmed-seq.qzv (318.5 KB)

Based on the quality plot, I've decided to run dada2 in a single-end approach (as I was doing previously), but with new parameters for trim and trunc. Analyzing table-single2.qzv in comparison with the table of the approach I showed earlier, I noticed a drop in the amount of features and reads, even with parameters that I considered well adjusted according to the Qscore of the quality plot.

Here are my code for dada2 denoise-single

    qiime dada2 denoise-single \
  --i-demultiplexed-seqs single-end-demux.qza \
  --p-trim-left 40 \
  --p-trunc-len 240 \
  --o-representative-sequences rep-seqs-single2.qza \
  --o-table table-single2.qza \
  --o-denoising-stats stats-single2.qza

the visualization file: table-single2.qzv (781.7 KB) rep-seqs-single2.qzv (719.7 KB) stats-single2.qzv (1.2 MB)

I've trained a classifier using the settings around my trim and trunc parameters, as you said, and use this discussion to help me with the --p-min-length and --p-max-length.

Here are my code for extract reads

qiime feature-classifier extract-reads \
 --i-sequences ./silva-138-SSURef-341f-805r-Seqs.qza \
 --p-f-primer CCTACGGGNGGCWGCAG \
 --p-r-primer GACTACHVGGGTATCTAATCC \
 --p-trunc-len 200 \
 --p-min-length 0 \
 --p-max-length 0 \
 --o-reads ./silva-138-SSURef-341f-805r.qza

Finally, I've repeated my decontamination steps using microDecon in R, and from the decontaminated table I generated a taxa barplot. Some observations:
I realized that there is no longer a taxon designated as a chloroplast, as there was in the barplot taxa I showed earlier. The decontaminated table returns ~600.000 reads and ~3400 ASVs. Are these numbers are good to proceed with downstream analysis? Unfortunately, I still have taxons that don't pass a taxonomic level like domain or class.

My table decontaminated, the taxa barplot from the decon table and from the "contaminated table": table-single-decon2.qzv (686.9 KB)
taxa-bar-plots_decon2.qzv (363.2 KB)
taxa-bar-plots_new.qzv (364.8 KB)

I didn't run a PCoA plot yet, because I want to be more carefully with this data before proceed with diversity analysis (according with moving pictures tutorial, PCoA is generated when running beta diversity commands).

I think my main problem might be that the similarity between the taxa barplot from the original and the "decontaminated" table might be because the ASVs that were removed by microDecon are being assigned to taxons that are also being assigned by the ASVs that remained in the table.
But I think maybe this is an issue for another topic.