Mismatched Id's

jmlayton · November 2, 2021, 9:21pm

I am running qiime2-2021.8 in an Ubuntu virtualbox. The files I am using are the forward, reverse, barcodes, and sample-metadata from the atacama soils tutorial. I am attempting to first merge the reads, then remove adapters, then trim the file using NGmerge. After this step I would like to import into qiime2 and demultiplex it.

#After installing the required files I run

NGmerge/NGmerge -1 forward.fastq.gz -2 reverse.fastq.gz -o NGmerge/sequenced -a -m 20 -e 50
NGmerge/NGmerge -1 NGmerge/sequenced_1.fastq.gz -2 NGmerge/sequenced_2.fastq.gz -o qiime/multi-sequences
cd qiime
mv multi-sequences.gz sequences.fastq.gz
cd $home
qiime tools import
--type EMPSingleEndSequences
--input-path emp-single-end-sequences/qiime
--output-path emp-single-end-sequences/emp-single-end-sequences.qza

#Up to now everything runs smooth. Then I run

qiime demux emp-single
--i-seqs emp-single-end-sequences.qza
--m-barcodes-file sample-metadata.tsv
--m-barcodes-column barcode-sequence
--o-per-sample-sequences demux.qza
--o-error-correction-details demux-details.qza
--verbose

My error command is as follows below

Anyone have ideas for how I can fix this?

colinbrislawn · November 2, 2021, 11:22pm

Hello @jmlayton,

Welcome to the forums! :qiime2:

The error of 'mismatched sequence IDs' is thrown when the sequence IDs don't match between your forward, reverse, and or index files. Some software for joining paired-end reads is careful to keep all your reads in order. Other software jumbles them up, and I think that's what happened here.

Yes! If you import your data into Qiime2 before joining, then join using a Qiime2 plugin, this problem will be solved. Bonus: some plugins like DADA2 will both trim, denoise, and join your reads all with one command.

I think this is the easiest way forward, unless you wanted to use NGmerge. (We can get that working too, if you would like!)

(This also avoids some spookiness, like importing EMPSingleEndSequences that are secretly JoinedSequencesWithQuality )

jmlayton · November 3, 2021, 2:50pm

Colin, thanks for the quick response! For the sake of simplicity, I agree that QIIME2 is the easier solution. However, I am trying to compare various pipelines to determine which one I'd like to use going forward. So, being able to run through a pipeline with NGmerge is one of my priorities. Also, would it be feasible to implement NGmerge as a plugin for QIIME2 or is it simpler to keep the process outside of the QIIME2 enviornment?

How would I confirm that NGmerge jumbles the reads and if it does, how should I go about unscrambling them?

SoilRotifer · November 3, 2021, 4:11pm

Hi @jmlayton,

@colinbrislawn and I had a brief chat and we think you should be able to do the following, somewhat roundabout, approach:

Import the raw paired-eads into QIIME 2, demultiplex them, then export the demuxed paired reads. From here you can merge the paired reads on a per-sample basis with NG merge. Finally, you can re-import these merged reads as JoinedSequencesWithQuality type using the Manifest format, or other format that assumes the data are already demuxed. Of course, you'd lose provenance in between the import/export steps.

This would also limit you to using deblur within QIIME 2 to analyze your merged reads. Although, nothing would stop you from running dada2 denoise-single (assuming you import the merged sequences as SequencesWithQuality, it'd violate the assumptions of dada2 denoise-single, and may return spurious ASVs.

Hope this helps!

colinbrislawn · November 3, 2021, 6:42pm

I agree with @SoilRotifer, and also wanted to 'qiime-in'

Benchmarking third-party software is a great use case for doing it 'outside' of the Qiime2 ecosystem. Keep-It-Super-Simple

And of course, you could eventually bring the best performing program into the Qiime2 ecosystem with a plugin!

Yes! When you are ready, check this out https://dev.qiime2.org/latest/tutorials/first-plugin-tutorial/

Connor_Herron · November 23, 2021, 9:49pm

Thanks for the help and suggestions so far. So I'm actually working with jmlayton on this.
I was curious when using the Manifest format for re-importing the merged reads back into qiime I was trying to figure out the format it needs to be in I was able to have the sample-id and the absolute path but the specifics of how to call qiime import using manifest on merged samples is alluding me currently. Is there a way to specify the import for merged sequences or must it be done with a forward and reverse sequence?

SoilRotifer · November 25, 2021, 5:13pm

Hi @Connor_Herron, you just need to make a tab-delimited manifest file as outlined here. Specifically, you'd make your manifest file look like:

sample-id absolute-filepath
sample-01 $PWD/some/filepath/merged_reads_sample_01.fastq
sample-02 $PWD/some/filepath/merged_reads_sample_02.fastq
sample-n ...

Assuming our manifest file is named merged-reads-manifest-file.tsv. Then you'd run:

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path merged-reads-manifest-file.tsv \
  --output-path merged-demux.qza \
  --input-format SingleEndFastqManifestPhred33V2

Note, we need to trick QIIME 2 by importing your merged data as a SingleEndFastqManifest... format. You may need to change the Phred33V2 to either Phred33, or Phred64V2 if the import does not work.

From here you can run deblur (not DADA2 as mentioned previosly), and/or OTU clustering.

-Cheers!
-Mike