I am looking for some advice on generating ASV tables for multiple studies. I have been using the q2-fondue package to aggregate data from a 12 different studies for 18S data that I want. The issue or area I am looking for advice on is that across these studies many illumina runs were done. With dada2 I know that it is important to run it one run at a time since each run has it our unique error rate (I am not an expert on this and going off general advice I have been given so correct me if wrong!). My main question is if anyone has any advice to use dada2 efficiently across many illumina runs or if the fondue data just needs to be subsetted and ran one at a time.
Hi @hunter-powell,
Your information is correct! because of the unique error-rates of each illumina run you will need to subset the studies and run each one through dada2, at this time there is no work around for this. However, I would suggest creating a script to stream line the process! good luck
--Hannah
Thanks so much for the feedback Hannah! So for the script you are suggesting, is there a way to know from the data what came from a specific illumina run? I could write a script to run on each bioproject or experiment, but some of the ones I am working with are aggregations of lots of data and done over a few illumina runs for a single accession I believe. Any suggestions on the general workflow of the script or how to have it auto subset would be amazing!!
Hi @hunter-powell,
Your metadata may or may not have a column that indicates the run. Because it is publicly available data there are no guarantees but f your metadata does have this information you can use this to help you filter down the samples before each dada2 run. I hope that is helpful!
--Hannah