In my project, samples were sequenced using MiSeq (V4) and PacBio (full-length 16S), and I need to compare the microbiome from both technologies. I am using alpha div metrics to compare the microbiome from MiSeq vs PacBio in terms of richness, diversity, evennes, and the ability to detect rare taxa (rarity-abundance)
I use Mann-Whitney U test to compare calculated alpha diversity metrics from both which should be okay in the case of using raw reads since all samples will be included. However, when using SRS for normalization, the omitted samples from each technology are different and I am not sure how to compare the remaining samples. Should I keep only shared samples after normalization? Any recommendations here?
One more thing, I used Pool option to process reads from each technology (DADA2 pipeline) and found that it is more suitable for PacBio reads and helped in detecting more rare ASVs. However, with MiSeq, it removed more than 50% of taxa including a taxon/genus that is very important to the microbiome of this tissue. This genus is less abundant and prevalent from MiSeq (includes 2 taxa) but very abundant and prevalent from PacBio (>200 taxa). When I searched this issue, I found it is due to primer bias to this genus/bacteria. So, I decided to use Pool = TRUE option with PacBio and Pool=FALSE with MiSeq. Then, when I calculated Chao1 metric for each microbiome, the values were the same as Observed_otu for MISeq reads since I did not use pool option, but its has different values when the MiSeq samples were pooled in processing. Regarding PacBio samples where pool option was used, I got calculated values. Since Chao1 is used to estimate unobserved taxa/rare taxa which should be enhanced by pooling, is it fair to compare the Chao1 index from both technologies knowing that singletons are already removed from both?