I have merged three different ITS data sets with the "qiime feature-table merge" and "qiime feature-table merge-seq-data" commands. However, the overall richness of one of the three datasets is reduced from 20,515 SVs to 185 SVs after merging. The other two samples were not reduced. I am surprised by this behaviour, as I just assumed any SVs that don't collapse into SVs in other datasets would simply be preserved post-merging. The data has all been trimmed to the same region (though ITS can have variable length, so that might be where things are getting weird). This is also put into contrast by the fact the merging of the paired 16S datasets does not exhibit this weirdness. I've attached an image of two venn diagrams showing the loss of SVs from the ITS 'watershed.liming' dataset.
you are correct on the default behavior â so this is indeed strange.
Are you trimming these sequences after denoising/clustering? If so, it is possible that SVs are being collapsed sensibly (though that is a large drop so probably not!).
I'm not seeing this image â large images sometimes fail to attach, please try again.
Could you please share:
the exact commands that you are running, from merging through alpha diversity testing
Each library has been processed up to step 7 using this workflow and then merged using the commands specified in Steps 2 and 3 in this workflow. After Step 3, only 185 rep sequences remain.
In response to your questions:
The individual libraries are not trimmed after denoising and clustering.
I've attached table and rep.seq files corresponding to the final merged dataset ("merged.") and one of the libraries from the 'watershed liming' ITS dataset that exhibits the loss of SVs ("woods."). I've also included a Venn diagram ("merging.weirdness.png") showing the total number of OTUs from each project group for 16S data (which worked as expected) and ITS (being affected). The Venn diagram was produced in R after importing the otu.table into a phyloseq object.
How are you determining the richness of what is effectively a subset after you have merged? I can think of a few ways to do this, but we don't directly expose that kind of functionality, so I am curious what your exact steps are for coming up with those numbers.
They should be preserved, in fact we have a test suite to ensure that is the case (of course, assuming there are no bugs )!
Feature tables aren't aware of the kinds of data they contain (e.g. ITS vs 16S), they just know about sample IDs and feature IDs, and the observations of those features, by sample.
Thanks, that was helpful! I prepared the following table from these artifacts:
filename | type | # samples | # features
-------------------------------|-------------------------|-----------|-----------
merged.rep.seqs.tree.final.qza | Phylogeny[Rooted] | NA | 81088
merged.table.final.qza | FeatureTable[Frequency] | 2888 | 80751
woods.rep.seqs.final.qza | FeatureData[Sequence] | NA | 5223
woods.table.qza | FeatureTable[Frequency] | 192 | 5223
You mentioned above that "only 185 rep sequences remain", but I am not seeing any numbers that support that (see my table above). Maybe you can provide some more detail to explain where you are seeing this 185 value at?
Can you provide some details about the Venn diagrams, too? It sounds like you are expecting these numbers to be OTUs/SVs, but is it possible those are sample counts? I ask because in the ITS diagram, the watershed.liming circle adds to 193 (0+64+45+84), which is suspiciously close to the number of samples observed in woods.table.qza - just looking for some clarification.
Lastly, I noticed in your attached script that you are using an old version of QIIME 2, and you built a for loop for dealing with merging more than two tables at a time. This was fixed in QIIME 2 2017.12, you can now merge an arbitrary number of tables in one command. I would recommend upgrading to the latest release (2018.2), we are only able to provide support for the latest at any given time.
Besides answering the questions inline above, can you please provide us with the FeatureTable[Frequency] or FeatureData[Sequence] artifacts that aren't merging how you would expect? Ideally you could put together a minimum working example, like two or three tables. You can share a link to those in a direct message if you aren't able to share publicly. Please also provide the exact commands so that we can reproduce the issue.
OK, that was helpful advice! After updating to the newest code and merging in one batch, all of the sequences from that particular project are now present in the final phyloseq object.
(FYI: I produced the Venn in R with limma.)
I am very appreciative of your prompt responses and willingness to help.