I am trying to assign taxonomy and am at the stage of extracting reads. I am not sure what trimming/truncation lengths I should assign. This time I am using Paired-end illumina sequences, which have been merged together following dada2, but do not know the resulting length of the sequences (and the initial trimming/truncation lengths varied anyway, because the quality varied and the sequences came from different runs which were then merged together). I only know the forward and reverse primer. So my question is whether I should leave those parameters at default value (0) and only extract reads based on primers. Is that going to negatively affect the taxonomy assignment?
I believe those parameters are more interesting for single-end data where you don’t have the primer-pair defining the positions. In your case, you should just use the primers since those same primers bound your input reads.
Quick question on this step: did your trim-left parameters vary between merged runs? If so, you’ve merged ASVs which won’t compare with each other since different runs will start at different positions (giving you a doozy of a batch effect). trunc-len still doesn’t matter, because forward and reverse are always merged.
Thanks for pointing this out, I had no idea. They did vary slightly (6 vs. 20) because in two of the runs I didn’t remove the primers completely, whilst in one of them which I had to rerun I did. However, I am currently running dada2 again on those 2 runs to remove primers completely, but I just wanted to go ahead with the analysis because of time issues - in case those finish in time I can merge them all together and do it again. Trunc-len was different because of differences in quality scores.
Just to update you. My runs have actually just finished today and I merged the tables as you suggested, with all the trim-left parameters being the same. Thanks again!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.