I have 16SrRNA for bacteria, archaea, protists and fungi of which first three microbes have 300-bp paired-end reads whereas fungi have 150-bp paired-end reads. I started with the whole data and after completing the quality control and feature table construction step (using DADA2), I am in dillema. Would you suggest to continue with the whole data or subset according to species and analyse separately for each microbes?
Well, this is an excellent question and I guess it depends on your questions and the tools you want to use.
Anyway, the easiest is to do the analysis separately and then use other tools to see if the communities move similarly (procrustes) or try to find correlations between the different datasets (for example, Pearson correlations).
Just FYI, the problem of doing the analysis all together is how to assign taxonomy to everything, perhaps via merging multiple references; how to build a tree for all these datasets, in case of phylogenetic metrics; among others.
Hope this helps.
Thanks a lot for your comment.
You’re welcome! Now, if you decide to go via the full analysis together and you come across some tools to solve those issues, I’ll love to hear about them …
I just want to add my two cents to @antgonza’s advice.
As @antgonza mentioned, taxonomic assignment and clustering/denoising steps will be complicated/prolonged if you choose to analyze all data together. It would be easiest to separate these data during demultiplexing and process separately by primer set.
Normalization (e.g., by rarefying or calculating proportions) could also be complicated if you kept these data types merged. It would be best to keep these data separate for calculating alpha/beta diversity and taxonomic abundances — though for alpha/beta diversity you could rarefy/normalize each dataset separately and then merge (e.g., to calculate “total number of features”, though “total 16S” and “total fungi” may be more useful anyway).
However, even if you analyze your datasets separately (as I would), merging at later steps could be beneficial. For example, the methods in
q2-sample-classifier could perform better if given multiple datasets to use as predictive features.
100% agree with @Nicholas_Bokulich advice, just note that an important thing here is could, as this will depend on your dataset, question, effect size, etc …
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.