I am in the process of running 25 fairly large samples, and unfortunately my computer system is unable to complete the "qiime phylogeny align-to-tree-mafft-fasttree" step. It appears to be that the computer runs out of memory while creating the distance matrix.
As a result of this discovery, I thought that I could run the samples in batches. Of the 25 samples, batches of 5 were collected from the same sites, meaning 5 sites in total were used. I now know that my computer is able to complete all the steps desired on 10 of the 25 samples, and for context, I am starting with importing my data from the .fastq format all the way to a taxa bar plot. Essentially my question is what are the detriments to running samples in batches such as this?
First of all, running your samples in batches is fine if:
in each batch, you have at least ~1M reads for Dada2
each batch is processed with identical parameters in all steps before Dada2 and Dada2 itself.
I that case you can merge your output files after Dada2.
However, you are getting an error at tree creation step, and to create one tree to rule them... for all batches you still need to merge representative sequences from all batches... So I guess running in batches is not an option for you.
Instead, you may consider following steps:
Use "parttree" option at tree construction step
Remove rare features from the table and representative sequences.
I prefer removing rare features (ASVs, counted, for example, less than 10 times overall and detected in less than 2-3 samples).
This step will reduce the number of unique ASVs by removing ASVs, that are not important for the analyses. Also, it will significantly speed up tree construction step and decrease memory requirements.