Hi, I have analysed the 16S rRNA V4 region from soil samples with an Illumina MiSeq machine, but am interested in comparing my data to a published dataset that was sequenced with IonTorrent. The IonTorrent data has significantly higher length reads than my Illumina data (median ~290bp vs ~250bp) so I am not sure whether a fair comparison is possible between the two datasets since longer reads will match reference taxonomy databases (eg. SILVA) more precisely. Any opinions?
Furthermore, I obtained much higher sequencing depth with Illumina (higher reads/sample) than the IonTorrent data, so am also not sure how to go about comparing diversity metrics…
you are right that out of the box comparison between those two studies will be biased, as different read lengths would dominate the signal. The way to go about this would be to first trim reads from both methods to the same length after demultiplexing. Then, perform either closed-reference OTU picking or (recommended) DADA2 or deblur on each of those studies.
Now, other things to consider is whether both studies used the same set of primers. I still would expect some biases between Illumina and IonTorrent, but if you’re looking for strong effects you might be able to observe them in your meta-analysis.
Regarding your concern about sequencing depth. Indeed, you will likely see depth-related effects if you don’t rarefy your data. So a way to mitigate this confounder would be to perform rarefaction at the same level on both datasets.
Depending on what insights are you hoping to gain from your comparison, a viable option might be to analyze each study separately and then see if your conclusions agree between studies.
Hi @tomasz , thank you for your detailed reply, I have followed your advice and truncated the reads from both methods to equalise lengths.
With regards to rarefaction of samples, the IonTorrent data ranges from ~3000-7000 reads/sample, whereas the Illumina data ranges from 35,000-200,000 reads/sample. As there is a huge discrepancy in sequencing depth, rarefying at 3,000 would exclude the majority of the Illumina data. How would you suggest I carry out the rarefaction? Or would normalize the Illumina data be a better approach?
@marianacosta I understand that discarding the majority of your Illumina data might look wasteful, but that could be the cost to pay for a fair comparison between techniques. A good way to assess whether you’re losing important information rarefying your data to 3,000 sequences/sample would be to perform alpha rarefaction and see if you sample enough diversity at 3,000.
Hello, I want to compare diversity in two different datasets which have very large ranges of sequencing depth. The first dataset only has 3 samples, and sequencing depth ranges from 2,600-3,600 counts/sample. The second dataset has 30 samples and sequencing depth ranges from 34,000-190,000 counts/sample. For alpha diversity, I could technically rarefy the larger dataset to 2,600 and carry out the analysis from there. However, I am not sure how to approach beta diversity analysis. Should I carry out a normalization step or just leave the data as is?
I was thinking of merging the OTU table of both data sets and normalizing them with a log transformation and then proceed with beta diversity analysis. Any thoughts?
I merged your 2 posts into 1 topic as they relate to the same thing.
For beta-diversity calculations, we also recommend rarefying data beforehand.
@Nicholas_Bokulich any thoughts on log-transforming the data?
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.