It is a 16S MiSeq run, 300bp paired-end. I have done the analyses before but I only got problems for this run and could not figure out why it is happening.
I was following the tutorial to import the I1 and R1 read fastq files by:
The meta-data mapping file CIMP_RO1.keemei.tsv (21.8 KB)
was validated by Keemei. I run the commands (tried both reverse complimentary and non-reverse complimentary options)
As we could tell from the demux.qzv, quite some samples in the meta-data mapping file are missing. All of them are from the bottom of the mapping file (fact: the missing ones are newly ordered barcodes, all the barcodes (former and new ones) are suggested from the EMP protocol). BTW, I did demultiplexing in Qiime1 and get exact the same output as from Qiime2.
We thought it may caused by the new barcode/index that the some how the new missing barcode related R1/R2 reads have low quality with did not pass the demultiplex filtration(?). So I checked the index read fastq, and extract the sequences ID of reads with barcode/index from those missing samples. I did find a significant number (~200,000) for this barcodes, therefore I pull out the reads from R1 and R2. I put those missing sample R1/R2 reads for FastQC and they turned out to be very good quality as other reads.
I am sort of confused that I find those missing samples in the index/R1/R2 file but could not be demultiplexed. I don't find any relevant post in the forum either. Could you please help to see if there is a way we could still savage the data for those missing samples?
Hey there @gc26762524! Would you be able to share your multiplexed seqs artifact with us? That might make diagnosis a bit easier (feel free to send a link in a private message to me).
Otherwise, I am not quite sure - it sounds like you are observing your barcode reads visibly in your index file, but for some reason they aren't being extracted out? Since you mentioned these are MiSeq reads, have you tried using Illumina's tools for demultiplexing? If so, did you have similar results?
HI @thermokarst, Thanks for working on it. I misunderstood you. Now I put qza files before and after demux in the google drive directory. Best regards,
Cheng.
Hmm - I am seeing a handful of samples from all over the metadata file - 26 sample missing in all.
I searched in the barcodes.fastq.gz file for a few of the barcodes for the missing samples (and their reverse complement) and didn't get a single hit (this is with manual searching...).
Any chance this is a clerical issue?
Currently, EMP-based demuxing doesn't support any form of error correction - if your reads still had the barcodes in the read, you could use q2-cutadapt to demux. Otherwise, you could use an external tool to demux, then import the demuxed reads into QIIME 2.
HI @thermokarst and @gc26762524. First post so i apologize if im doing anything wrong here. I am having the same problems i believe. To summarize, i have illumina paired-end data, multiplexed in 3 files (R1, R2, R3 (index)). When demultiplexing with demux emp-paired, i lose a lot of my samples (21 or so). Interestingly, when i grep and manually search the index file, these missing samples have TONS of hits in the index fastq.
My question is: why would there be tons of barcodes in the barcodes.fastq (R3) file for a sample that does not get demultiplex (ie, i lose it)? I would love to get those samples back if i could haha. Heres my command:
qiime demux emp-paired
--m-barcodes-file qiime2_sample_mani.txt
--m-barcodes-column BarcodeSequence
--i-seqs emp-paired-end-sequences.qza
--o-per-sample-sequences demux
PS. I got the same results using illumina's demultiplexer, Qiime1 and Qiime2. Ill put the qiime1 split_library_log.txt up there too, it shows almost 7 million unassigned reads.
I figured out my problem that my case is actually a stupid mistake. Thanks to @thermokarst help, that I figured out the file I was using is wrong. Story: I renamed wrong files (from previous other run) in the emp-single-end-sequences directory for Qiime2 emp-single-end-sequences.qza generation (barcodes.fastq.gz & sequence.fastq.gz), while I am checking the correct Index file for the barcode, hence I found the barcodes (of course) but used the wrong emp-single-end-sequences.qza file for Qiime2 pipeline. I don't know if that is your case, but still hope that will help.
Thanks for sharing your data, @aoliver2! I whipped up a quick check, to give us an idea of barcode counts (note, ag is just like grep, only a lot faster):
Hmm, I count 17 samples that are completely unobserved in this list of barcodes, plus another sizeable chunk of single-digits counts. Where are you getting 21 from?
Strange! Maybe this is a clerical issue? At least as far as the data you sent me goes, that brute-force search just yielded pretty similar results to what you experienced in q2-demux, although it doesnt sound consistent with your manual searches! I think I just did the same basic thing here as you did in your manual searches and didn't find the same results as you did. Maybe I am looking at the wrong file? Maybe your grep command did some fuzzy matching? What do you think?
You are a champion, i dig that ag tool, definitely going to get that going. Thanks a ton for transparency and help.
You are totally right and your numbers match up really nicely with our demultiplexing data.
My best guess is a horrible file conversion problem. I converted the barcodes.fq to a fasta and then grep-ed for the barcodes in that fasta. Going back to the raw fastq...i got the same answer you got. Cant explain it other than a wonky perl script for the conversion.