Questions on demultiplexing using

Hello, I'm new to Bioinformatics and this is the 1st time I'm using Qiime2. What I'm working on is demultiplexing a paired-end raw data set. Here is an example of what my barcodes.fastq.gz have:

@M01522:221:000000000-BRWK5:1:1101:20002:1874 1:N:0:CCTTGA
CAGTTCAT
+
8-8C@<@F

According to my understanding about Fastq files, the CCTTGA (6 nucleotides) is where the barcode should be. But in my sample-metadata.tsv file, the barcode is CAGTTCAT (8 nucleotides). At the same time, when I export the demultiplex-seqs.qza file after demux to see what my sequence looks like, I found out that my sequences had this form of name (the barcode when directly into the name):

10_CATGTTGT_L001_R1_001.fastq.gz

And inside the file, it looked like this:

@M01522:221:000000000-BRWK5:1:1101:15205:1891 1:N:0:CCTTGA
ACACCCCTTTCAGTTGGGACTCTTTTGTCGTTACCCCCTTAAGAAGCCCCTCCCAACTACGTTCCAGCAGCCGCTGTTACACGTTGTTGTCCCTCTTTTTCCTTATTTATTGTTCGTAAAGTGCTCGTCGTCGGTTCGTTAATTCGTGTTTTAAACCTCCAGGCTCTTCCTTCATTCTCCCCTCCTTTCTTCTGTGACTTGTTT

The visualization of Per-sample sequence counts looks very weird (some sample has very large amount of reads (S17 with 600000+ reads) while some has very few (S6 with 5000+)

I don't understand what exacly what this CCTTGA is and I don't know if there was something wrong with my barcodes or sample-metadata files. Please enlight me! Thank you!

Hello @Minh_Tr_n,

Did you demultiplex using qiime2? If so, can you post the command you used?

Thanks for the quick response!
Here are the commands and files that I used for my demux:

Commands:

mkdir muxed-pe-barcode-in-seq

qiime tools import
--type MultiplexedPairedEndBarcodeInSequence
--input-path muxed-pe-barcode-in-seq
--output-path multiplexed-seqs.qza

qiime cutadapt demux-paired
--i-seqs multiplexed-seqs.qza
--m-forward-barcodes-file sample-metadata.tsv
--m-forward-barcodes-column BarcodeSequence
--p-error-rate 0.125
--o-per-sample-sequences demultiplexed-seqs.qza
--o-untrimmed-sequences untrimmed.qza
--verbose

qiime demux summarize
--i-data demultiplexed-seqs.qza
--o-visualization demultiplexed-seqs.qzv

Files:

Reads and Barcodes:

https://drive.google.com/drive/folders/182KpgtJGGrjBPV3fMbACB9wva_D2uzdj?usp=sharing

Visualization after demux

demultiplexed-seqs.qzv (321.3 KB)

Hello @Minh_Tr_n,

From the command line help text for cutadapt demux-paired:

Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes
are expected to be located within the sequence data (versus the header, or a
separate barcode file).

I believe that if you have a separate barcode file demux emp-paired is the recommended command.

Sorry for not mentioning, my file is not in EMP format. The primers I used are these 2 and they are just common none-EMP-primers.

Forward: 341F: CCTACGGGNGGCWGCAG
Reverse: 785R: GACTACHVGGGTATCTAATCC

That's why I used demux paired-end, not emp paired-end commands.

Hello @Minh_Tr_n,

Although you may have not used the primers published in the protocol, I believe that your data is in a compatible format for the demux emp-paired action--give it a try and see if it works. You may have to turn off golay error correction. You can follow along with the first step from this tutorial to do so.

2 Likes