I am running qiime2-2021.8 in an Ubuntu virtualbox. The files I am using are the forward, reverse, barcodes, and sample-metadata from the atacama soils tutorial. I am attempting to first merge the reads, then remove adapters, then trim the file using NGmerge. After this step I would like to import into qiime2 and demultiplex it.
The error of 'mismatched sequence IDs' is thrown when the sequence IDs don't match between your forward, reverse, and or index files. Some software for joining paired-end reads is careful to keep all your reads in order. Other software jumbles them up, and I think that's what happened here.
Yes! If you import your data into Qiime2 before joining, then join using a Qiime2 plugin, this problem will be solved. Bonus: some plugins like DADA2 will both trim, denoise, and join your reads all with one command.
I think this is the easiest way forward, unless you wanted to use NGmerge. (We can get that working too, if you would like!)
(This also avoids some spookiness, like importing EMPSingleEndSequences that are secretly JoinedSequencesWithQuality )
Colin, thanks for the quick response! For the sake of simplicity, I agree that QIIME2 is the easier solution. However, I am trying to compare various pipelines to determine which one I'd like to use going forward. So, being able to run through a pipeline with NGmerge is one of my priorities. Also, would it be feasible to implement NGmerge as a plugin for QIIME2 or is it simpler to keep the process outside of the QIIME2 enviornment?
How would I confirm that NGmerge jumbles the reads and if it does, how should I go about unscrambling them?
@colinbrislawn and I had a brief chat and we think you should be able to do the following, somewhat roundabout, approach:
Import the raw paired-eads into QIIME 2, demultiplex them, then export the demuxed paired reads. From here you can merge the paired reads on a per-sample basis with NG merge. Finally, you can re-import these merged reads as JoinedSequencesWithQuality type using the Manifest format, or other format that assumes the data are already demuxed. Of course, you'd lose provenance in between the import/export steps.
This would also limit you to using deblur within QIIME 2 to analyze your merged reads. Although, nothing would stop you from running dada2 denoise-single (assuming you import the merged sequences as SequencesWithQuality, it'd violate the assumptions of dada2 denoise-single, and may return spurious ASVs.
Thanks for the help and suggestions so far. So I'm actually working with jmlayton on this.
I was curious when using the Manifest format for re-importing the merged reads back into qiime I was trying to figure out the format it needs to be in I was able to have the sample-id and the absolute path but the specifics of how to call qiime import using manifest on merged samples is alluding me currently. Is there a way to specify the import for merged sequences or must it be done with a forward and reverse sequence?
Note, we need to trick QIIME 2 by importing your merged data as a SingleEndFastqManifest... format. You may need to change the Phred33V2 to either Phred33, or Phred64V2 if the import does not work.
From here you can run deblur (not DADA2 as mentioned previosly), and/or OTU clustering.