I have 8 different studies (each having demultiplexed paired-end reads). My objective is to perform a meta-analysis to identify the microbial communities that are found in different geographic locations and the microbes that the unique to a particular location.
My question is at what point in the analyses should I merge the data? Should I keep the sample sequences from all the studies in a single folder and run the analyses step by step (as given in qiime2-2018.4 tutorial)? Or should I run each study separately, obtain the OTU tables for each study and then merge them?
A couple of questions before we can make recommendations:
- Are the reads from these studies from the same hypervariabe region using the same primers?
- Are you planning on using denoising methods such as Deblur or DADA2 or OTU clustering methods? Or both?
Additionally, you also might want to consider using qiita which is really designed for metanalyses, though it may not be as flexible with the various options as Qiime2.
Thanks for replying !
Ans.1 The reads are from different studies with different hypervariable regions and primers (not all different but most of them are different).
Ans.2 Yes, I plan to use dada2 for denoising and OTU clustering methods.
Comparing different variable regions is tricky at best, there is significant inherited bias introduced from the differing regions which introduces artificial patterns in your data and you need to account for those. A tool like q2-fragment-insertion may be your best bet, or another (less ideal in my opinion) option is to collapse the reads to say the genus level and analyse all your data based on that.
In the first scenario you would denoise each study separately, merge them after, (OTU cluster, though this really isn’t needed), then use fragment-insertion. At this point you should stick with phylogenetic-based analyses as the other methods will still be vulnerable to region-driven bias.
In the second scenario you would denoise each study separately, merge them, OTU clustering (if you must), assign taxonomy, collapse to genus level, and analyse.
Keep in mind both of these methods have not been thoroughly tested in the wild as far as I am aware. I would lean with fragment-insertion approach but others have other views.
I am trying the approach suggested by you to denoise each study separately and then merge the data later. However, while importing the data for one study (containing 240 samples), the programme ran for an hour and then the output shows “Killed”. This means no data got imported. I then reduced my data size to 2 samples (4 fastq files) by editting my manifest file. The data got successfully imported. I don’t understand what is the problem??
Note: I am running the command in the same directory where my fastq files are located. I am using the manifest file approach to import the data (as I have paired-end demultiplexed reads).
This importing issue seems unrelated to this thread, so it would be better discussed elsewhere. Usually I would assign a new topic in these situations however, the issue you are reporting seems to be similar and likely related to the other issue you have opened on another thread. Please continue the troubleshooting on that thread with your additional details.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.