Pardon my terminology as I am brand new to the microbiome sequencing bioinformatics space. Any help / advice is appreicated!
I have 16S V4-V5 sequencing amplified with the 563F and 926R primers, from MiSeq. I received paired-end multiplexed FASTQ files (R1.fastq.gz and R2.fastq.gz), as well as a file containing primer and barcode mapping for each sample. Example as such:
|primer|AYTGGGYDTAAAGNG|CCGTCAATTYHTTTRAGT|
|primer|CCGTCAATTYHTTTRAGT|AYTGGGYDTAAAGNG|
|BARCODE|ACGCAATGTCTG|ACGCAATGTCTG|Sample1|
My understanding so far is that, AYTGGGYDTAAAGNG is the forward primer and CCGTCAATTYHTTTRAGT is the reverse primer. The data also seems to be "dual-barcoded", meaning that a barcode sequence exists in both the R1 and R2 sequence. Here's an example:
R1 (note line number precedes FASTQ content):
201-@M00430:5389:000000000-J2L69:1:2106:19124:2126 1:N:0:ATAGTCTG+GCATTGCA
202:ACGCAATGTACGCAATGTCTGATTGGGTGTAAAGTG...read continues to length of 251
R2 (line number in front):
201-@M00430:5389:000000000-J2L69:1:2106:19124:2126 2:N:0:ATAGTCTG+GCATTGCA
202:CACGCAATGTCTGCCGTCAATTTTTTTTAGT...read continues to length of 251
So I am dealing with "barcode in-sequence" data, and the primer seems to be immediately following the barcode, so far so good.
And then I realized that for the same adaptor, I'm seeing reads in R1 and correspondingly in R2 that have two sets of different lengths of sequences preceding them. So for the same adaptor as above I also have reads like:
R1 (line number precedes FASTQ content):
337-@M00430:5389:000000000-J2L69:1:2106:12161:2165 1:N:0:ATAGTCTG+GCATTGCA
338:AACGCAATGTCTGCCGTCAATTCCTTTAAGT...read continues to length of 251
R2 (line number):
337-@M00430:5389:000000000-J2L69:1:2106:12161:2165 2:N:0:ATAGTCTG+GCATTGCA
338:CCGCAATGTACGCAATGTCTGACTGGGTTTAAAGGG...read continues to length of 251
Since I'm seeing barcodes after the adaptors and what should be the reverse primer in R1 and the forward primer in R2 (all NOT in reverse complement), I believe this means that I have "mixed-orientation reads" as well...
Taken together, I believe I have multiplexed paired-end dual-barcode reads with barcode-in-sequence and mixed-orientation, is that correct?
I was hoping that the --p-mixed-orientation
option in qiime cutadapt demux-paired
could solve my problems but got the error that Dual-indexed barcodes for mixed orientation reads are not supported.
so now am I at a lost again...
I will be going through the rest of the forum trying to comb through terminology and find a solution. Apologies in advance that this might be a duplicated question, but even if I can get confirmation that indeed my reads are what I have diagnosed them to be that would be fantastic.
Thank you!