Hi all, I'm new to Qiime2 and bioinformatics in general.
I'm not sure how to troubleshoot this issue! About 75% of total reads are assigned to one sample after de-multiplexing. I haven't found a post with my exact issue - but maybe I'm using the wrong keywords. I made an earlier post where I originally thought the problem was DADA2, and thought I should make a new post.
I'm using qiime2-2023.5 in conda. The data is 300bp single-end multiplexed Illumina Miseq data, custom barcodes included, forward reads only. The data does not include primers or adapters. I get the same issue with an ITS library and a custom amplicon library. I know there isn't an issue with the run data itself, since this issue didn't arise when we used UPARSE on the same data.
Troubleshooting so far:
Barcodes: I've checked that the barcodes in my mapping file match the barcodes in the original fastq file. (used grep to search the headers in the fastq with ':barcode-sequence'. Each one got thousands of hits, so the mapping file is correct).
I'm not sure this info. also helps understand what happened - but the DADA2 de-noising step filters almost all of my sample reads, except for the one sample with ~10 million reads (~60% pass filter). For most of the 96 samples, less than 1% of reads are passing the filter.
Many thanks in advance for any ideas on how to troubleshoot further!
I know there isn't an issue with the run data itself, since this issue didn't arise when we used UPARSE on the same data.
The reads were more evenly spread across samples? Can you show a screenshot or something similar?
Barcodes: I've checked that the barcodes in my mapping file match the barcodes in the original fastq file. (used grep to search the headers in the fastq with ':barcode-sequence'. Each one got thousands of hits, so the mapping file is correct).
Did the number of hits for a barcode match the number of reads allotted to the sample that that barcode represents?
You used qiime cutadapt demux-single, the help text for which says:
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes
are expected to be located within the sequence data (versus the header, or a
separate barcode file).
You said that your barcodes are in the header, so that's probably your problem.
Thanks so much for helping Colin! I think you're right that I used the wrong demultiplexing option.
I think we found a fix (below with notes), because the reads per barcode now exactly match their counts in the original fastq file.
Notes & code:
I think I misunderstood which demultiplexing command to use, because our sequencer does not send a barcode.fastq file separately but instead imbeds it into one fastq file. We switched from the 'barcodes in sequence' option to the demux emp-single option for demultiplexing.
We also have custom barcodes which may not be Earth Microbiome Protocol, so we turned off the Golay option.
For anyone who has a similar issue, here are the commands I used:
We originally used:
qiime tools import
--type MultiplexedSingleEndBarcodeInSequence
--input-path JB155_FC1577L1P1.fastq.gz
--output-path multiplexed-seqs.qza
The fix: We used a different import and demux command, and turned the Golay option to off. This import option required us to use a Qiime1 command to extract a barcodes.fastq.gz file from the fastq file first.
mkdir emp-paired-end-sequences
#Need a command to separate the barcodes.fastq.gz from the fastq.gz file. Follow directions for making the directory and file names exactly, with no extra files in the directory.
This is an interesting workaround. I think you're right that qiime2 doesn't really support your use case--non EMP sequences with barcodes in the headers. If this is working for you then great.
What is the reasoning behind disabling golay error correction? I believe this just helps to account for sequencing errors in the barcode sequences and isn't EMP specific. It makes sense that your grep counts line up with the golay-disabled counts because these are both exact matches only. However, you'll probably want to account for the mismatches too.
Hi Colin,
We did try this with Golay on, here's the error code it threw:
Plugin error from demux:
No sequences were mapped to samples. Check that your barcodes are in the correct orientation (see the rev_comp_barcodes and/or rev_comp_mapping_barcodes options). If barcodes are NOT Golay format set golay_error_correction to False.
See above for debug info.
We believe the barcodes are in the correct orientation. So, we think that the barcodes are not in the Golay format requried for EMP, since they are also not EMP.