Recently I got some sequence data from Miseq PE300 mode.
The quality is a little bit terrible which I have to set the --p-max-ee-f and --p-max-ee-f to 10 to get more sequence passed the q2-dada2 filtering process. Of course I followed with a blast step to further remove the nontarget amplicon.
But still some of the samples data did not achieve a satisfied sequencing depth so I may have to resequencing part of the samples.
I use the same DNA ,PCR again with the same primer ,prepare library with the same protocal and resequnce the amplicon with the same Miseq PE300 mode.
So how do I handle the second data with q2-dada2.I imagine several strategies ：
denosied two runs separately then merge them
discarded those non-satisfied sample in the first round,follow with strategy 1
combined the same sample’s data in both two runs before analysis and follow a q2-dada2 denosie step only once.
I understand that DADA2 required separately denoising the sequencing data in each run and merging feature table and repseqs after because the batch effects may affect the Learn Error Rate Model.
But what if this special case？ Hope somebody could solve my problem.
Yikes! That is a really permissive setting and may lead to errors creeping into your data. If the data are that bad you may want to consider resequencing the entire dataset… it is worth discussing low-quality runs with your sequencing core/service provider.
Yes, your merging options 1 or 2 are the way to proceed with merging separate runs.
If you do resequence only select samples beware you might run into issues with batch effects. Keep track of which samples were sequenced in each run (add this to your sample metadata!) and make sure this is totally random, i.e., that run does not covary with any sample metadata variables. Otherwise what looks like an effect from, say, “Treatment” could really be an effect of sequencing run!
the entire sample is 144 and only 20samples are unsatisfied. It may be a little bit wasteful to resequence all the samples
PE300 is always a bad choice but I have to use because of the long amplicon length. I thought my service provider may pool two many samples to a chip to save cost Helplessness!
I will try to see what will happen!
Thanks for your precious advice!