Generating ASV for meta-analysis study from q2-fondue data

hunter-powell · August 15, 2024, 8:56pm

Hello,

I am looking for some advice on generating ASV tables for multiple studies. I have been using the q2-fondue package to aggregate data from a 12 different studies for 18S data that I want. The issue or area I am looking for advice on is that across these studies many illumina runs were done. With dada2 I know that it is important to run it one run at a time since each run has it our unique error rate (I am not an expert on this and going off general advice I have been given so correct me if wrong!). My main question is if anyone has any advice to use dada2 efficiently across many illumina runs or if the fondue data just needs to be subsetted and ran one at a time.

Thank you!!

-Hunter

jphagen · August 15, 2024, 9:09pm

Hi @hunter-powell,
Your information is correct! because of the unique error-rates of each illumina run you will need to subset the studies and run each one through dada2, at this time there is no work around for this. However, I would suggest creating a script to stream line the process! good luck
--Hannah

hunter-powell · August 16, 2024, 4:41pm

Thanks so much for the feedback Hannah! So for the script you are suggesting, is there a way to know from the data what came from a specific illumina run? I could write a script to run on each bioproject or experiment, but some of the ones I am working with are aggregations of lots of data and done over a few illumina runs for a single accession I believe. Any suggestions on the general workflow of the script or how to have it auto subset would be amazing!!

-Hunter

jphagen · August 16, 2024, 10:33pm

Hi @hunter-powell,
Your metadata may or may not have a column that indicates the run. Because it is publicly available data there are no guarantees but f your metadata does have this information you can use this to help you filter down the samples before each dada2 run. I hope that is helpful!
--Hannah