This is a question about where the merged feature table and sequence names come from.
Let’s say I have 10 lanes of amplicon data. I have processed each lane of data separately to the point where I have a denoised frequency table and representative sequence set (feature table and feature sequences).
It looks like the FMT tutorial suggests that merging these tables and sequences is as simple as:
qiime feature-table merge \
--i-tables table-1.qza \
--i-tables table-2.qza \
... {more tables}
--i-tables table-10.qza
--o-merged-table table.qza
qiime feature-table merge-seqs \
--i-data rep-seqs-1.qza \
--i-data rep-seqs-2.qza \
... {more sequences}
--i-data rep-seqs10.qza
--o-merged-data rep-seqs.qza
When I go about merging these data, I was wondering what name would be preserved for each of the representative sequences. With DADA2 it used to name these as iSeqs, and I think in newer versions it renamed these as ASVs. I’m wondering what happens when you have two (or more) identical sequences getting merged from two (or more) datasets which have different ASV names.
Is the documentation in this program implying that the first feature id is what is retained?
If different feature data is present for the same feature
id in the inputs, the data from the first will be propagated to the
result.
Thus, if I had some representative sequence present in feature table 1, 2, and 8 that were getting merged, would it be likely that whatever ASV name was assigned to feature table 1 is likely what is then going to be applied to feature tables 2 and 8 also?
It can’t be quite that simple though, because there is the possibility that redundant ASV names are applied in each feature table, but those ASVs don’t have to represent the same sequence variant. Given that you savvy QIIMErs have solved every problem I’ve ever thought up (and more!) I’m wondering if you can help me understand the relationship between the input ASV names and the resulting merged table and sequence names. Is there no relationship?
Thanks!