Choosing qiime feature-table merge overlaping method

I have some collections of 16S data from different sequencing protocols. My data comprised both single and paired-end sequences. To include all of the data in taxonomy assignment I denoised each paired-end dataset as 2 different single-end data set ( 1 forward and 1 reverse, separately). I have used the following command using qiime2 2019.4:

qiime feature-table merge --i-tables ./*.qza --p-overlap-method sum --output-dir merged_table/

I am wondering if I have noise in both feature tables (forward and reverse), I suspect that --p-overlap-method sum doubling the noises.

In that case could you please advise me, regarding my merging options

Good morning @ashutosh

Great question!

How did you process your data into ASVs/OTUs? I ask because ASVs are resilient noise across runs, so processing each run individually should remove noise individually, so that you can merge your tables exactly how you describe without worrying about noise. :+1:

The bigger issue is length of amplified region between paired and single reads. So if your single ends are 150 long and your paired reads are 250 after joining, these different lengths could cause issues… But if your single ends are 250 and your paired reads are 250 after joining, then the features should merge fine after making ASVs with DADA2!

Let me know about your data and I’m sure we can find a good option.



Thank you so much @colinbrislawn

Actually, I have 18 different datasets. The lengths for single-end data are 150, 250, 300 and 500 and for paired-end data, they are 300, 500 and 600. In this case, would it be fine to merge them as I did?

To inform you, earlier I have processed the data single-end as the single-end and paired-end as paired-end. After plotting the OTUs I found a good correlation between paired-end and single-end OTUs calls, however, there are quite a few samples which are called incorrectly between paired-end and single-end. Therefore, I called each paired-end data as two different datasets (forward and reverse).

Looking forward to hearing from you



Hello Ashutosh,

Thank you for telling me more about all your data sets!

I have 18 different datasets

WOW! Someone must really trust you with all their data sets!
(or maybe they gave you a lot of work!!! :smile::+1: )

If these 18 data sets are really different, (like some come from people, some come from environmental samples, some come from syn-bio experiments), you should treat all 18 as seperate data sets and process them all separately.

Qiime 2 :qiime2: gives you a lot of control over how you process and manage many samples. Choosing how you group your samples let's you tell a story with the data you have. This is a great opportunity and you should do your best to tell a good story for each data set, especially if you have 18 data sets!


1 Like

Thanks @colinbrislawn

Actually, all of them are microbiome data for one host species from 18 different studies

Ah OK!

So while the initial studies are different, you plan to analysis them together in a single cohort. Cool.

I think so… but it would be good to check! Try running qiime feature-table heatmap and see if your samples cluster based on biology, or based on the read length. Once you verify that the read length is not introducing bias, you should be good to go!

The EMP :earth_americas: also had reads of different lengths across their samples, and their solution was to trim all reads to a consistent length. You could try that too.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.