I am looking to merge together 3 different rep-seqs files from 3 different illumina runs so I can make a tree including all of these data. One of the runs was trimmed to 125 bp while the others are still 150 bp. Will I still be able to merge these rep-seqs files? I am seeing in the sequence files that the same sequence in the dataset trimmed by 25 bp gives a different feature id than the same sequence in the untrimmed dataset, so this makes me think that I will need to either trim my longer sequences or reprocess my trimmed sequences to be 150 bp.
Any advice on this topic?
Thank you so much for your time, I appreciate you Qiime2 support team!
Just like you said, the recommended method is to use reads from the same region of the same length. But even then, it can be tricky to merge if the rep-seqs files were created separately, depending on the program used for clustering / denoising. Was dada2 used to process these three runs?
Of course, there’s always the method guaranteed to work: start at the beginning and fully reprocess these three data sets together. But you might be able to start with the rep-set files, if they were made with dada2.
Thank you for getting back to me so quickly! Each of the runs were processed using deblur, since they were all preprocessed in qiita and then imported into Qiime2. Is there a reason that we wouldn’t be able to merge then, and would instead need to reprocess with dada2?
It is sounding like we will need to start from the very beginning. Thanks for your advice!
that's fine — everything @colinbrislawn wrote about dada2 applies to deblur as well. You do not need to start again, except to process the third dataset with the same trim length as the others.