Hello,
I have a question regarding the workflow for demultiplexing I used in Qiime2 with my sequencing data. I know similar questions have been asked about this (e.g. here Importing and Demultiplex process for 4 Fastq Files: R1, R2, Index1 and Index2 and here Importing multiplexed paired-end data with separated barcode sequence files) but in the end I decided to still write something, also because I know it's an issue that other users might have.
My situation:
I have soil/leaf/root microbiome sequencing data from the Illumina MiSeq, I did my libraries using standard primers and barcodes and have my paired-end reads (2x250) in the format:
I1
I2
R1
R2
Following the forum posts mentioned before I decided to give the extract.barcodes.py a try, which worked and produced one barcodes.fastq file.
I imported this back into qiime as EMP Paired-End sequences, now having my
-forward.fastq.gz
-reverse.fastq.gz
-barcodes.fastq.gz
file in one folder.
The import worked and I decided to go on with demultiplexing, now giving the demux emp-paired a try, because why not.
... Surprisingly, it worked! Or at least I think so.
Now the thing I wonder is, HOW? Since I was literally just telling demux to pull the barcodes from my metadata .csv file in which I concatenated my forward and reverse barcodes (or index, meaning the D701 and D501 barcodes). How would this work in demux, since I don't even specify the length of the barcodes I used (sometimes more than 8bp) it should look for? How would it split the barcodes?
I thought maybe it is smart enough to pull this information from the (also merged!) barcodes.fastq.gz file that I fed in? Still, other parts of the concatenated barcodes might match my insert sequences somewhere else...
I know the PE dmx isn't implemented yet but somehow the results I obtained look okay, I hope you'll be able to see this if I insert a screenshot of my demux.qzv here?
and my read counts are:
So basically I'm really confused, also because I am not at all coming from a bioinformatics background.... I apologize if this is complete gibberish, but still hope someone might help me out
I'd really love this to work with Qiime2, all other options are much more complicated and besides the dmx part, it is so straightforward! I worked through all other analyses with this data on qiime2, so in the end I learned a lot even if someone is now telling me that what I did was completely wrong
Thanks a lot for your help in advance!