I am completing a meta-analysis on data that has different hypervariable regions. I understand that guidelines require that closed reference OTU picking be performed, because one can not compare different hypervariable regions to each other.
However, I am curious as to why researchers cannot run de novo OTU picking on a single study, assign taxa to the OTU (using QIIME or other means), then combine the results after assigning taxa. To me, this would result in a reduced loss of data compared to closed reference OTU picking, but I don’t know what other implications might occur. Is there some sort of bias that occurs because of this? Or other problems that I don’t see?
Also, I had not seen how to merge results after FeatureTable[Frequency (table.qza.) and FeatureData[Sequence] (rep-seqs.qza), at least in QIIME 2. If anyone knows how to merge taxa tables, I would like to try both ways mentioned above to see what results are produced.
I hope my explanation made sense. If required, I can provide a list of commands that I would use to perform the two separate analyses above to better explain what I am trying to say.
Thank you for your help. I look forward to your response.
The second piece is again, a general issue with microbiome data and relates to naming. We often assume in both phylogenetic analyses and collapsed taxonomic evaluations that things that are evolutionarily similar or clustered together behave the same way. This is a questionable at best assumption, but sometimes a decent working hypothesis. I’d argue the quality of your aggregation gets weaker as you go up in taxonomic levels, but that it can also be hard to classify at lower levels.
Also, I suppose, you assume all your sequences are equally valuable and carry meaning. Given that de novo picking requires chimera slaying, you’re still discarding sequences. IMO, this assumption may be true for environmental samples. However, when you’re dealing with human data in a well defined environment, it feels to me like a disservice to the community to not use a closed reference method, since it reduces the external validity of your study.