Quality control: demux summarize, dada2 and q2-quality-control

Hello everyone!

I'm working with 16s data (V3-V4 region) to identify and analyze bacterial composition from chicken fecal samples. We want to understand the changes in bacterial community between different treatments.

So far I have been focusing on quality control with dada2 according to quality scores shown in the plots generated by demux summarize (and I thought that was it for quality control).

But now that I see there is q2-quality-control, I am confused:

  1. What are the differences in quality control using dada2 versus using q2-quality-control?

I know the inputs and parameters are different, but I don't understand the differences between their final goals, so:

  1. Which one would be necessary for my research? Or should I use both?

  2. Do the quality plots from demux summarize provide me with any information about the need to use q2-quality-control?

Sorry about the confusion. :face_with_peeking_eye: :face_with_spiral_eyes: Any help would be very appreciated :sunflower:

PS.: All of this confusion came up because of the following situation:

My colleague is analyzing the same data using mothur and he found that the sequences from one of the samples are showing problems on the alignment step - the number of bases after alignment is very small (maximum of 45nt, when the rest of the samples have about 450nt) and the sample is lost on the filtering step. I thought this would show up in the quality plots as bad quality, but it doesn't - the qs for this sample are very good.

Why does this happen with the aligment step? How could we handle it?

Hiii!

let see if I can help you here!

Let's start from dada2 vs q2-quality-control.

I would say they work at totally defferent levels, and especially for dada2, this does far more than only quality control!. I think the easiset way is to look at the "moving pictures tutorial", where the quality control step is really tight to the abundance feature table construction. The dada2 plug in is designed to denoise (alias error correct the sequences to recontruct the potential original apmplicon sequences). Sometimes you need to exclude some sequences before the denoising step, because you discover there are some contaminating sequences mixed to your good ones. That is the case where the q2-quality-control plug ins may come handy, because they give you few options to clean up the sequences: you could filter your sequences by quality scores (if you are unhappy and want to exclude some low quality sequence first, this is suggested befor working with deblur in the moving pictures tutorial for example!), or by comparing your sequences with others species/genomic data (e.g you want to exclude amplicons derived by the host genome, chicken in your case)

Hope these makes sense! So to go back on questions 2 and 3. You definetly need to work with a denoiser (dada2 or deblur), to get the feature abundance table. There is the possibility that you need to exclude some sequence before this step, hence you need to use the q2-quality-control plug ins.

The demux summarise plot may give an hint if some of the sequences in your data are showing a low quality, and you want to get rid of them before the denoising step.

If any unwanted sequences is still in the final abundance table, there are ways to filter out even after the denoising step.

On the mothur question, I am not familiar with it really so not sure about it, however short sequences may be derived by host genome or contamination in general, so you could select few of them and see what they are.

Hope it helps
Luca