Hello, I'm new to Bioinformatics and this is the 1st time I'm using Qiime2. What I'm working on is demultiplexing a paired-end raw data set. Here is an example of what my barcodes.fastq.gz have:
According to my understanding about Fastq files, the CCTTGA (6 nucleotides) is where the barcode should be. But in my sample-metadata.tsv file, the barcode is CAGTTCAT (8 nucleotides). At the same time, when I export the demultiplex-seqs.qza file after demux to see what my sequence looks like, I found out that my sequences had this form of name (the barcode when directly into the name):
The visualization of Per-sample sequence counts looks very weird (some sample has very large amount of reads (S17 with 600000+ reads) while some has very few (S6 with 5000+)
I don't understand what exacly what this CCTTGA is and I don't know if there was something wrong with my barcodes or sample-metadata files. Please enlight me! Thank you!
From the command line help text for cutadapt demux-paired:
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes
are expected to be located within the sequence data (versus the header, or a
separate barcode file).
I believe that if you have a separate barcode file demux emp-paired is the recommended command.
Although you may have not used the primers published in the protocol, I believe that your data is in a compatible format for the demux emp-paired action--give it a try and see if it works. You may have to turn off golay error correction. You can follow along with the first step from this tutorial to do so.