Different Sequencing Data -> Different Quality

Hey all!

I'm working with three different datasets that have undergone three separate sequencings (i.e. each dataset was sequenced separately, although with all the same conditions).

As expected, each dastaset retrieves me sequences with different qualities. In the past I've analyzed each dataset separetely, performing DADA2 that was better adapted to each dataset. However I've seen how analyzing each dataset separetely can be very limiting for the interpretation of my results.

Generally, after performing the separate DADA2 I would retrieve a feature retention between 75 to 95% (depending if my amplicons were from ITS or 16S, respectively) at the sampling depth in which the highest amount of features were retained in 100% of the samples. After importing the datasets simultaneously, this DADA2 step led me to choose a truncation that retrieved a feature retention of 35 to 65% (ITS and 16S, respectively).

I know this happens because of the different sequences quality and size. But I was wondering if I would be able to perform the DADA2 separately for each dataset, and only afterwards merge the different feature-table and feature-sequences artifacts, so I could still get all the taxonomy and diversity analyses together.

Some enlightenment on this topic would be very much appreqciated hehe, thank you :grin:

Hello and welcome to the forum!

Actually, performing dada2 separately for each run is very important and it is not recommended to run dada2 with samples from different runs. It is related to the error training step and may result in biased datasets.

However, in order to get comparable ASVs and be able to properly combine datasets, you need to run each dataset with absolutely identical parameters before merging (cutadapt, dada2).

So, you need to decide the best parameters based on all three datasets, run it separately for each dataset with identical parameters and then merge feature tables and rep-seqs for further analyses.



What do you mean by this? You mean I should use the same truncation length (fw and rv) for all datasets, or that I should use the same tool (DADA2)?

In any case, thank you so much Timur, this was really helpful :smiley:

1 Like

I mean, the same commands and the same parameters. If parameters will be different, then even the same sequences will produce slightly different ASVs. This will introduce unpredictable biases in alpha and beta diversity metrics.

So, the only difference should be in names of input and output files...