Demuz single-end Mismatched sequence ids

PatoUru · January 30, 2019, 9:41pm

Hi Matthew,
Hope you're well. There is a question similar to this but the issue is already closed (New to Qiime2 Mismatched sequence ids:).
I am a new user of qiime2, and I'm having some problems with IONTORRENT sequence analysis.

$ qiime demux emp-single --i-seqs chipA_RAW_DATA.qza --m-barcodes-file mapping_file_ChipA_Bacteria --m-barcodes-column BarcodeSequence --output-dir Demux_ChipA

Plugin error from demux:
Mismatched sequence ids: HWI-EAS440_0386: 1: 23: 17547: 1423 # 0 and YIKQF: 00004: 00005
Debug info has been saved to /tmp/qiime2-q2cli-err-cfeopbh4.log

In the link that I attached, I could see that the mappingfile must have the same number of lines as the sequence file. My mapping file only has 20 lines corresponding to the 20 barcodes. Is that the error?
Many thanks!!

thermokarst · February 1, 2019, 2:43pm

Hi @PatoUru -
I moved this question out of your DM to me and into the public-facing portion of the forum - we are unable to provide private support at this time.

This is not quite correct - the sequence file needs to have the same number of lines as the barcodes file, not the metadata (mapping) file. The metadata file will have one line for each sample.

Nope - the problem is that your sequences and barcodes files aren't in the same order. Taking a step back - what files do you have from the sequencing instrument? I don't think I have ever seen EMP-formatted data off of an iontorrent instrument before.

PatoUru · February 1, 2019, 6:43pm

Hi Matthew,
Finally, I found a qiime1 script, and I was able to extract the barcodes list from the multiplexed sequence file. I imported the data in qiime2 and then I started with denoising, but ... 12 hours ago I was analyzing the sequences and it was not over. Do you think maybe there is something wrong?

extract_barcodes.py -f R_2018_07_19_20_09_31_user_SN2-76-Chile_chipA_Etchebehere_19_07_2018.fastq --bc1_len 10 -o parsed_barcodes/ --input_type barcode_single_end

qiime tools import --type EMPSingleEndSequences --input-path fastq_chipA/ --output-path chipA_RAW_DATA.qza

Instead of EMP, what would be the correct method to analyze sequences obtained from ion torrent ?

Here below are some lines of the sequences.fastq:

@7M5T2:00004:00071
TGAGCAGAACGATACTGGGTGTAAAGTGTGCGTAGGCTGCACGGTAAGTCAGATGTGAAAGCCCGGAGCTCAACTTCGGAATTGCATCCGATACTGCCGTGCTTGAGGACTGGAGAGGAGATTGGAATTCACAGTGGAGCAGTGAAATGCGTAGATATTGTGAGGAACACTAGTGGCGAAGGCGAATCTCTTGACAGTTCCTGACGCTGAGGCACGAAGGCCAGGGGAGCAAACGGGATTAGATACCCGCGTA
+
CB;;;;;@=@;;>>>>CD>CCDCC;B;<<<;;<<CAAB????>ACC:==A<<<<;>BCC?CCC=C@?@?>;;6>>;>A@C<?<;;;;;0;;;;>?CCACCCCCAB>><>;;B@??@DAEECCACACAC;;;>>;;?@CCB?>>??C>CC@@@C@>@CC<<<B;;7;7<;;;C@;;7;;;7;?BBCACDCCD???CCCC@CACCCD?@@CCC@@?;BD7;<?>?;EF;@@@CC?CCD>??<;AAA<<<1<BBAA
@7M5T2:00004:00074
CTGCAAGTTCGATACTGGGTTTAAAGGGCATGCAGGCGGTTATACAAGTAGGATGTGAAAGCCTGGGGCTCAACCTCAGAACTGCATTCTAAACTGTGTGACTAGAGTATTAGCAGGGGGAGACGGAATTTCAGGTGTAGCGGTGGAATGCCTAGATATCTGAAAGAACACCAAAGGCGAAGGCAGTCTCCTGGGCAAATACATGACGCTCATATGCGAAAGCGTGGGTAGCAAAACAGGATTAGATACCCCAGTA
+
;;;;;05506@C;;;;B=1;;,66,6=66;;;;;;7;;7;06666<7<A;;7;;;;;;B1;;:5;BB/;;;;7;=555A<A99;;;;7;;@@9@;;;@@;;;;;;@;0//--7CDFEF6C5;;;0;7==4;;6;<;00--)-3)-)-24649;;4442222222.58C@C>-/./-)-..-;::22922:9......---------33:DCC?;;;;CD/9::::::.8887<?=88811333)1---
@7M5T2:00004:00076
CAGAAGGAACGATACTGGGTTTAAAGCGTGCGTAGGCGGTATGGTAAGACTTGGGTGAAATCTCCATGCTTAACGTGGAGGGAGCCTGGGAGACTGCCGTGCTAGAGGATTGGAGGGGAGACTGGAATTCTTGGAGTAGCAGTGAAATGCGTAGATATCAAGAGGAACACCAGTGGCGAAGGCGAGTCTCTGGACAATTCCTGACGCTGAGGCACGAAGGCCAGGGGATCAAACGGGATTAGATACCCTGGTA
+

CC@CAC???CBBBBBB=CC>CE?@@@CCCC??=:?@A@@>CACCAB@BB6;;1;A@@9>>;D<>?CCC@CACCCDACCE=@@@@BCC=?@A>>BD??@CCCCCCCACCACACCDC:????CC@C@C>?>AC@AA;<<;<?CDDD@@@???;BBC@B?@<
CA@?>7>??><>BCAD@@@FCCBAB;;7;;;7;?B7;@CC@?>CCC@CCBCB>=6=6;;;;/;@@CC?;;;1;;7;;;;BCC=CC>??

I really appreciate your assistance,
Patricia

thermokarst · February 1, 2019, 7:06pm

Can you provide some details about what the original data itself looks like? Is it multiplexed? If so, are the barcodes in the reads? Are they in the headers?

PatoUru · February 1, 2019, 8:21pm

Thanks for your quick response!! I have the sequences.fastq file that contains 20 samples. The sequences are multiplexed, and it has the barcode (10bp) at the beginning of the sequence.
In bold is the barcode:

@7M5T2:00004:00071
TGAGCAGAACGATACTGGGTGTAAAGTGTGCGTAGGCTGCACGGTAAGTCAGATGTGAAAGCCCGGAGCTCAACTTCGGAATTGCATCCGATACTGCCGTGCTTGAGGACTGGAGAGGAGATTGGAATTCACAGTGGAGCAGTGAAATGCGTAGATATTGTGAGGAACACTAGTGGCGAAGGCGAATCTCTTGACAGTTCCTGACGCTGAGGCACGAAGGCCAGGGGAGCAAACGGGATTAGATACCCGCGTA
+
CB;;;;;@=@;;>>>>CD>CCDCC;B;<<<;;<<CAAB???>ACC:==A<<<<;>BCC?CCC=C@?@?>;;6>>;>A@C<?<;;;;;0;;;;>?CCACCCCCAB>><>;;

thermokarst · February 4, 2019, 8:28pm

Thanks for the info @PatoUru! Good news, I think we can use an easier import/demux workflow that doesn't require you to run QIIME 1 scipts. Your data appears to be a good candidate for q2-cutadapt:

You can import and demux using cutadapt, which will be able to search your original reads for your barcodes. Please give that a shot and let us know how it goes!

system · March 8, 2019, 2:32am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.