Sum option of feature-table merge --p-overlap-method giving weird results

Hello

I am using QIIME2-2019.4 version installed with conda.

I am having issues interpreting the results from the following command:

qiime feature-table merge \
--p-overlap-method sum \
--i-tables run_1_table-dada2.qza \
--i-tables run_2_table-dada2.qza \
--o-merged-table merged_runs_table-dada2.qza 

Run 2 contains the samples that were sequenced in run 1 and some new samples that were missed in run 1. Therefore samples that are in both runs were called the same.

Run 1 has 95 samples and run 2 has 64 samples with a total of 111 unique samples.

I wanted to check that the output was correct so I ran:
qiime feature-table summarize --i-table merged_run_table-dada2.qza --o-visualization merged_run_table-dada2.qzv

Initially it looked fine with 111 samples.

The bit that confuses me is:
The merged sample table has a total frequency of 17,711,692
The Run 1 table has a total frequency of 123,450
The Run 2 table has a total frequency of 7,844,500

Additionally the feature counts for each sample in the interactive sample detail section is confusing me.
In the Merged table there are Five samples with a feature count of 0
In Run 1 table there are three samples with a feature count of 0 (only 1 of these samples overlaps with the 0 feature count samples in the merged run table).
In the Run 2 table there are Zero samples with a feature count of 0.

Any help with how the command works or if there is an issue here would be greatly appreciated.

If you have any other questions please don’t hesitate to reply.

Thanks for reading
Matthew Gemmell

Hi @Matthew_Gemmell,
Welcome to the forum!
That is strange indeed. Would you mind sharing your 2 feature-tables with us to help us reproduce the problem? You can DM these if you rather not share the tables publicly.

2 Likes

Hi @Matthew_Gemmell,
Thanks for sharing those tables. I was not able to reproduce the behavior you are describing.
Visualizing the run1 and run2 gives me the following:
Run1- Total sequences: 9,867,192; 96 samples of which 5 have zero features
Run2- Total sequences: 7,844,500, 65 samples of which none have zero features
merged.qzv (722.7 KB) - Total Sequences: 17,711,692 , 11 samples of which the same 5 samples have zero features
As you can see the feature counts are adding up as expected, the # of samples match up to what we expect and the same 5 samples have zero counts.

I’m wondering if you accidentally used the wrong tables in your merging command?

Oh in addition, I noticed in your provenance that you have set very strict p-truncq values in your dada2 command. You may want to leave these as the default value or at least something much less strict. See this recent thread for a bit more discussion on this.

Hi @Mehrbod_Estaki Thanks for that, I must have the wrong qzv for the run_1 data. Thanks!

1 Like