When I try to import the data from a fastq.gz file, I am confused with the type I need to choose. For example, if I get the paired end sequences, should I choose the SampleData[PairedEndSequencesWithQuality] or EMPPairedEndSequences? In the “Atacama soil microbiome” tutorial, they used the later one. I guess I need to use the first one. But what are the differences between these two types? How to decide which one I need to choose?
Hi @Marine_Microbiology,
Good question — it all depends on whether your data are demultiplexed or not.
This is a demultiplexed format (i.e., each sample has its own forward/reverse sequence files)
This is a multiplexed format (i.e., all sequences are contained in a single forward and single reverse file)
You should check out this tutorial and also the flowchart in the importing tutorial to get a better idea of which type is right for you, and to see how to process these.
I hope that helps!
Thanks for your reply. I read some more materials, and understand the difference between these two types. But I got more questions.
1, If the data has its own forward/reverse sequences files, but with the barcode in, do I still consider them as demultiplexed data?
2, If I consider the data with its own forward/reverse sequences files, but with the barcode in as the demultiplexed data, should I remove the barcode just as trim the low quality data? Does it also mean that there is no need to use the ‘demux’ plugin?
3, I guess the ‘MultiplexedPairedEndBarcodeInSequence’ type is also for demultiplexed data, what are the differences between ‘MultiplexedPairedEndBarcodeInSequence’ and ‘EMPPairedEndSequences’?
No, those are not demultiplexed. See this tutorial to see how to demultiplex sequences that are barcodes inside the sequences.
The difference is essentially just that whether the barcode is part of the sequences or in its own separate sequence file.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.