Sequence merge of different run


I have been using dada2 to run my 16s Miseq data. I found that DADA2 will give the rep-seqs.qza and table.qza. This will be the final sequence variants and the count table. The final taxonomy.qza will be based on the rep-seqs.qza, that means how many sequence variants in the rep-seqs.qza table will produce how many feature sequence(the so-called OTU/SV). No collapse will be executed in the taxonomy picking procedure, right?

Here comes the problem, if one of the sequence variants is 1 nt or 2nt shorter than the other one. Does that mean they will be pick into the same taxonomy but the different OTU/SV(sequence variant)?

However, if I have two runs in DADA2, then I merge the table1.qza with table2.qza, rep-seqs1.qza merged with rep-seqs2.qza, then the new table.qza and rep-seqs.qza. And then I pick the taxonomy. If there are 1 sequence variant in rep-seqs1.qza(SV1) are exactly the same as 1 sequence variants(SV2) in rep-seqs2.qza, will they be finally merged as the same sequence variants(SV3) and counted 2 in the newly produced rep-seqs.qza and table.qza, OR they were counted differently in the newly rep-seqs.qza?

On the other hand, if there is one sequence variant (SV1) in rep-seqs1.qza is 10 bases shorter(only shorter, the common parts are exactly the same) than one sequence variant(SV2) in rep-seqs2.qza, but they are from one the same strain or taxonomy. If I merge these two rep-seqs.qza, will these two SVs be merged into one SV? OR they will be counted differently?

Just confused, looking forward to the answers.

Hi @Brandon,

Sorry for the very delayed response. These are great questions!

Correct, it is just a mapping of ASVs to Taxonomy strings. Nothing more than that happens.

Yes, they will be different ASVs (with different feature IDs) while the taxonomy could be the same (or not, depending on the sequences).

Assuming they are exactly the same sequence, you'll have the counts from both tables. The rep-seqs (FeatureData[Sequence]) artifacts don't actually count anything, they just map feature IDs to DNA sequences.

So in your example, assuming the rep-seqs looked the same before merging, your merged rep-seqs will contain every unique sequence exactly once. This is because the feature IDs we use are hashes of the sequence, so if we see the same sequence, we get the same hash. When these get merged, it sees a "duplicate" ID and just writes it down once.

In this case they will be counted as independent ASVs as one of your sequences is a different length and therefore produces a different hash, giving QIIME 2 a brand new ID. There is no "substring" handling of sequences in QIIME 2, so it's important to make sure your ASVs are trimmed the same way so that they are comparable. Otherwise they will look like entirely new kinds of ASVs to QIIME 2.

This isn't much different from having a shorter primer pair that is contained within a longer primer pair. There isn't enough sequence information on the short pair to tell it apart from potentially several unique ASVs on the longer pair. This is where using taxonomy can be helpful as it acts as a shared reference.


Hi, @ebolyen,

Thanks for the detailed answer. That makes much more sense to me.

Millions of thanks.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.