Illumina demultiplexed data with inline index sequences (index5+index7)

afrinaad · December 30, 2019, 3:49pm

Hi, I am very new in bioinformatics and I'm learning to process my raw data by myself right now. Qiime2 seems like a suitable tool for me to start with.

However, I stumble upon some issue with my data. I have demultiplexed data (as I received R1 and R2 fastq file and I didn't receive a third file from sequencer) and try to follow "Atacama soil microbiome" tutorial. A researcher suggests that I have to demultiplex again as there are index sequences (dual-index?) at the header of my fastq.

BUT, I notice that the indexes are not the same for all reads in the file as shown below (the first 2 lines) where the difference is at the first base of the index:

@M04129:83:000000000-CHCFJ:1:1101:14877:1164 1:N:0:NGAGGCTG+NTCTAGCT ATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGGATCAGCGTGA

@M04129:83:000000000-CHCFJ:1:1101:21711:1452 1:N:0:CGAGGCTG+TTCTAGCT ATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGGATCAGCGTGT

Therefore, after hours of reading tutorials and questions, my next plan is either:

use q2-cutadapt to demux/trim the index out; but I would have to do it multiple times as there are different indexes used in the fastq (like this or this)
the file is actually already demultiplexed, so I can just proceed to "Moving Pictures" tutorial

My main problem right now is the existence of different index sequences at the header in fastq file where I am not sure what to do about it.

Thank you so much for all the help in advance and happy new year!

Nicholas_Bokulich · January 8, 2020, 5:15pm

Hi @afrinaad,
Welcome to the QIIME 2 forum! and I apologize for the delayed response.

QIIME 2 does support dual-index read demultiplexing (via the q2-cutadapt demux-paired method), but only when the index sequences are in-line with the sequences, not in the header line.

Since the reads are already demultiplexed yes I recommend just proceeding

I am not sure what to say about that — you may want to consult the method that you used for demultiplexing. It looks like the issue is that index sequences with ambiguous bases are being demultiplexed, whereas they should probably be thrown out... a sequence error in the index read can be bad news and is probably only occurring in a fraction of your reads so I would advise "playing it safe" and throwing out those reads. This is not something that QIIME 2 can do, however, so this should be done prior to importing reads into QIIME 2 (and ideally whatever method you used for demultiplexing might have an option to do this automatically).

Good luck!

afrinaad · January 13, 2020, 12:40pm

I guess I missed this post. I didn't know dual-index is already available now. Thank you!

I will proceed with this then in case the index are still in the sequences. This is because if the reads were demultiplexed, what is the purpose of the index sequences at the header line.

afrinaad · January 14, 2020, 3:07am

Following the Moving Pictures tutorial, I did demux summarize to view the graphs of my merged reads but this is what I get:

It looks so weird compared to the one in the tutorials as there is not black bar (the main data) shown.

Nicholas_Bokulich · January 14, 2020, 3:13am

What sequencing platform did you use? See here:

afrinaad · January 14, 2020, 8:56am

The sequencing platform is NovaSeq6000 250PE.