Hi, I am very new in bioinformatics and I’m learning to process my raw data by myself right now. Qiime2 seems like a suitable tool for me to start with.
However, I stumble upon some issue with my data. I have demultiplexed data (as I received R1 and R2 fastq file and I didn’t receive a third file from sequencer) and try to follow “Atacama soil microbiome” tutorial. A researcher suggests that I have to demultiplex again as there are index sequences (dual-index?) at the header of my fastq.
BUT, I notice that the indexes are not the same for all reads in the file as shown below (the first 2 lines) where the difference is at the first base of the index:
@M04129:83:000000000-CHCFJ:1:1101:14877:1164 1:N:0:NGAGGCTG+NTCTAGCT ATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGGATCAGCGTGA
@M04129:83:000000000-CHCFJ:1:1101:21711:1452 1:N:0:CGAGGCTG+TTCTAGCT ATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGGATCAGCGTGT
Therefore, after hours of reading tutorials and questions, my next plan is either:
- use q2-cutadapt to demux/trim the index out; but I would have to do it multiple times as there are different indexes used in the fastq (like this or this)
- the file is actually already demultiplexed, so I can just proceed to “Moving Pictures” tutorial
My main problem right now is the existence of different index sequences at the header in fastq file where I am not sure what to do about it.
Thank you so much for all the help in advance and happy new year!