These two files have not been demultiplexed yet. So I want to use the plugin “qiime cutadapt demux-paired” to demultiplex all samples. The code I used is listed below.
However, the problem occurred, although the log did not indicate any error or warning, the output file shows that most of the reads (~80%) were discarded. I repeatedly confirmed that the barcode was not wrong.
Then, I tried the second method below for comparison. That is, the two files of pair-ended sequencing were regarded as two single-ended sequencing files, and then the corresponding barcode (forward or reverse barcode) was respectively used for file demultiplexing. The code was listed below. The results showed that most of the reads were remained, approximately 80% - 90%.
So, I want to know what may cause this problem? Is there any problem with my parameter which led to this fatal error? In addition, is the second treatment method feasible?
Thank you for bringing this interesting question to the forums!
This issue is surprising, as you have confirmed that the barcodes are correct and work with single end reads, so now we have got to figure out what’s messing up the paired reads!
Have you (pre)processed your reads outside of Qiime before running qiime cutadapt demux-paired . I wonder if cutadapt needs the reads to be in the same order, and if some sort of processing changed their order or labels…
Thank you so much.
I cut the first 12 bp of every reads by a python script in the 5’ side. Then I removed the duplicates in these reads. I found there are nearly more than 1,000,000 different barcodes in all these reads. In addition to the target barcodes, the abundance of other barcodes is also of very high concentration. Considering the fact that barcodes may have errors and the result in the second treatment methods, I think the most likely cause is as you said that if cutadapt needs the reads to be in the same order, and if some sort of processing changed their order or labels.
So as you said, how can I pre-process these data before running cutadapt, in order that the reads or the labels can be in the right order. I am not good at programming. The data comes from the sequencing company without any processing.
Based on the process you have described, I think the issue could be removing duplicates. Deduplication would remove many reads, but is not recommended this early in the pipeline for 16S amplicon reads.
Try demultiplexing with your raw reads, then trimming inside of Qiime 2 as shown in this part of the Atacama soil microbiome tutorial.
I cut the first 12 bp
You can do that here too, when you run the qiime dada2 denoise-paired command by passing this setting: --p-trim-left-f 12 -p-trim-left-r 12