How to interpret the post-trim read joining and q-score filtering outputs

Hi! I am searching for guidance on whether my data prep is appropriate to run both DADA2 and Deblur. I have paired-end 16s sequences (fastq files with read1 and read2 as separate files for each sample) so already demultiplexed when I got it back from Illumina. I was able to successfully import the fastq files, perform cut-adapt (the script didn't provide any errors though I'm still figuring out if I trimmed the correct primer sequences because I modeled my script on an old post docs, I have another post about this if its relevant!), merged the reads with qiime vsearch join-pairs, and quality-score filtered with the code below.

qiime quality-filter q-score
--i-demux step.01c.joined.qza
--p-min-quality 4
--p-quality-window 3
--verbose
--o-filtered-sequences step.01d.joined.filtered.qza
--o-filter-stats step.01d.joined.filtered.stats.qza
echo "backend: Agg" > ~/.config/matplotlib/matplotlibrc
qiime demux summarize
--i-data step.01d.joined.filtered.qza
--o-visualization step.01d.joined.filtered.qzv

Below are the 3 outputs I got after each of the steps.
my output looked like this after trimming with cut-adapt:

After merging the reads (join-pairs with vsearch):

after quality-filter on the joined reads:

This is my first time performing analysis on any 16s seq data, I previously have only done the wetlab library work - what should I be taking away from these plots about the data? What are the important characteristics from these plots that are necessary to incorporate in my next script for denoising (the investigator would like me to perform two separate denoising jobs, one with deblur and one with DADA2 because they want to see how the ASV output compares to the OTU output before moving forward).

Many thanks for any assistance!

Hello @kida_miska,

Welcome to the forums! :qiime2:

This is my first time performing analysis on any 16s seq data, I previously have only done the wetlab library work

You are off to a great start and are asking all the right questions. Thank you for including both the commands and output graphs in your post.

The process of using DADA2 is described in Atacama Soils Tutorial, so check that out if you have not already. The DADA2 section includes a section on interpreting those graphs and choosing trim and trunc settings for DADA2.

The Moving Pictures Tutorial has a section on deblur.
:warning: Running join-pairs with vsearch is recommended for deblur, but not for DADA2 because DADA2 requires works best with unjoined reads.

That sounds like a cool comparison :sunglasses:

Did you include any positive controls with a known composition to see if DADA2 and deblur are closest to the expected results? :bar_chart:

HI, this is my first time using qiime2 and I am also trying to analyze paired-end 16s sequences. After joining the reads using vsearch and quality filter I have obtained similar graphs to those shared by kida_miska. Can anyone tell me what the straight line with no box plot in the middle of the graph means and should I interpret it?

2 Likes

Hi @rcarcamoc,
The straight line with no box plot in the middle of the graph is usually where your two sequences merged. Since those overlapping base pairs are supported by two reads their quality is usually pretty high and doesn't vary.
:turtle:

4 Likes