Hi Everyone,
I am working on 16S rRNA data that represent 25 samples collected from dead standing trees with either white- or brown-rot fungi grown on it.
I didn’t observe any clear significant differences between the two categories in terms of the alpha diversity. However, I could see some significance in the beta-diversity. The PCoA plot showed that there is separation but not so clear. The same samples were used for the ITS based analyses by someone else who found that 8 out of 25 samples don’t contain any ITSs representing the above mentioned brown- or white-rot fungi. I removed those 8 samples from my analyses altogether to see if the 16S data become less fuzzy and a bit more revealing.
I did the following:
1. Removed the raw reads of all the 8 samples and re-run the analyses again on 17 samples.
I found that the alpha diversity differences did not change much in terms of statistical significance. However, to my surprise the beta diversity changed and I could not see any statistical significance at all and the PCoA plot was all over the place.
2. Instead of removing all the raw reads and re-doing the demux and DADA2 steps, I made a new metadata file with only 17 samples and performed the downstream analyses using the table.qza representing all 25 samples. (i.e. I took the table.qza representing all 25 samples that was generated after dada2 step and filtered it so that only 17 samples are available for downstream analyses).
I found that the alpha-diversity significance did not change but the beta-diversity was significant. The PCoA was very clear showing separation of groups. The data appeared good enough to show some clear differences.
Note : 25 samples contained 12 brown-rot samples and 13 white-rot samples. While after removal of 8 samples I got a total of 17 samples that contain 10 brown-rot and 7 white-rot samples. DNA extractions, experimental design, collection of samples etc were done by someone else, I only got the .fastq files of the sequences.
My questions:
Which method is scientifically correct? What would be your advice to make sense of the data in the best possible way?
Thank you all,