Greeting, Qiime2 team!
I used Qiime2, version 2020.8, to analyze my 16s data. My data format is based on pair-ended sequencing, with the import type --- 'MultiplexedPairedEndBarcodeInSequence'.
And I demultiplex my data to split all samples according to the metadata, using command 'qiime cutadapt demux-paired' as follows:
qiime cutadapt demux-paired
qiime demux summarize \
At the same time, I also imply to demultiplex all samples.
fastq-multx -B -m 0 -b forward.fastq reverse.fastq -o %.R1.fastq %.R2.fastq
fastq-multx -B -m 0 -b reverse.fastq forward.fastq -o %.R1.fastq %.R2.fastq
#manually combine two results of fastq-multx
Then I compare the two kinds of demultiplexing results:
'qiime cutadapt demux-paired' remains 8611978 counts of sequence, while 'fastq-multx' only remains 4164473 counts of sequences.
Furthermore, I find that one of my sample---'TA_Blank_3_lib2' show strangely distinct demultiplexing results: 2165635 counts of sequence after 'qiime cutadapt demux-paired'; 4881 counts of sequence after 'fastq-multx'.
To figure out the reason for these great differences, I randomly select one demultiplexing sequence in 'TA_Blank_3_lib2' by 'qiime cutadapt demux-paired':
I grep this sequence on the unmatch resulting file of 'fastq-multx'
And I further grep this sequence on the raw data:
As shown above, 'qiime cutadapt demux-paired' split this sequence into the sample only by 3 nt sequence---'GGA', which is the end of the barcode of 'TA_Blank_3_lib2' (containing 12 nt of sequences: GAACACTTTGGA).
Therefore, I am very curious about what may cause this problem.
For example, how does this plugin 'qiime cutadapt demux-paired' treat with barcodes like these:
GAACACTTTGGA - sequence (Sample1)
NNNCACTTTGGA - sequence (Sample2)
NNNNNNTTTGGA - sequence (Sample3)
NNNNNNNNNGGA - sequence (Sample4)
NNNNNNNNNGGA - sequence (Sample5)
Do these sequences split into one sample? In our current results, we find that sequences of Sample4 and Sample5 are being demultiplexed into one single sample, which means that this plugin identifies these sequences with incomplete barcodes into the sample samples.
In addition, should I change a tool in qiime2 to demultiplex, like 'qiime demux emp-paired'?