demux removing all reads

davidlab-duke-tm · November 10, 2022, 6:56pm

I've read previous posts where people had similar problems and still don't have an answer as to what is going on, so TIA!

I'm running qiime2 (q2cli version 2020.8.0) on a computing cluster. I installed it by converting the Docker container release (Docker Hub) into a Singularity container, which is compatible with computing clusters.

I'm running the following command:

qiime demux emp-paired \
  --m-barcodes-file mapping.txt \
  --m-barcodes-column barcode-sequence \
  --p-rev-comp-mapping-barcodes \
  --i-seqs $OUTPUT/emp-paired-end-sequences.qza \
  --o-per-sample-sequences $OUTPUT/demux-full.qza \
  --o-error-correction-details $OUTPUT/demux-details.qza \
  --p-no-golay-error-correction

Sample of barcodes file:

sample-id	barcode-sequence
#q2:types	categorical
sample1	CGTACCAGATCC
sample2	ATGTTTAGACGG
sample3	ACATGTCACGTG
sample4	CTTTAGCGCTGG
sample5	CTGGTCTTACGG
sample6	CAAGTCGAATAC
sample7	GCAAGTGTGAGG
sample8	CTCGGTCAACCA
sample9	ACCCTATTGCGG
sample10	TCCGTTCGTTTA
sample11	ACCACCGTAACC
sample12	CATTTCGCACTT
sample13	TTAAGCGCCTGA

However, when I look at the output, I'm only getting 10-50 reads per sample. The barcodes are in the correct orientation, since if I remove the --p-rev-comp-mapping-barcodes flag then I get the error "No sequences were mapped to samples. Check that your barcodes are in the correct orientation."

This dataset was previously processed using a different analysis pipeline by a bioinformatics core and the median reads per sample was 45k. There are some issues with the pipeline used by the core, which is why I'm looking into processing these samples myself, but at the very least I know that there is sequence data and I am somehow losing it. I am unfamiliar with qiime2 and not sure where to start with figuring out what is going on, so thank you in advance for your help!

colinbrislawn · November 12, 2022, 2:39pm

Hello David,

Welcome to the forums!

Thank you for the detailed explanation and your example code. I think you are on the right track and there are a few more things you can try.

Just to cover all our bases (pun intended ), you could also try variations of the barcodes that have been

reversed (but not complimented)
complimented (but not reversed)

using a tool this or this.

One of these different variations should get you the ~45k reads per sample reported by the sequencing core.

(You could also import from their pipeline after demultiplexing, but let's see if we can get this running totally within Qiime2's tracking system.)

davidlab-duke-tm · November 17, 2022, 12:41am

Thank you for your help! I tried demultiplexing with the reverse complement/complement sequences of my barcodes and no sequences came out, so that doesn't appear to be the problem. Do you know what else could be going wrong?

colinbrislawn · November 17, 2022, 2:05am

That's all I can think of.

Have you reached out to the bioinformatics core to see how they were able to demultiplex this data set? Perhaps you are using the wrong barcode file or something, or there is some special pre-processing they are doing to make this work.

If they can send you demultiplexed fastq files (with separate samples in separate fastq files), you can import those into Qiime2 too, as I mentioned.

davidlab-duke-tm · November 23, 2022, 9:38pm

Turns out it was an upstream data preprocessing issue--thanks so much for your help!

system · December 25, 2022, 3:39am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.