I have a merged fastq file, with sample name in the head, as below. The sample name in the header is the only identifier to distinguish the sequences. May I know how to demultiplex this in qiime2 and import it into qiime2? I want to use DADA2 to pick rep_seqs later. Thanks in advance.
Sorry for the delayed response and thanks for the sample reads!
Based on those samples, it looks like the sample ID is delimited with a -- in the header? We don’t have anything in QIIME 2 that recognizes that, or anything that can demultiplex based on matching something in a FASTQ header.
Where did you get your data from? Does it exist in a different format by any chance?
Happy Thanks Giving! Thanks for your reply!
This is the data merged from FLASH. I do not have the original samples. During these days, I have demultiplexed all these samples now. So now merged reads for each samples are in separate fastq.gz/ fastq files.
May I know is there any way to import these demultiplexed samples and later deal with DADA2 or Deblur? And is it appropriate to deal with QIIME2?
Thanks for your quick response.
My samples are 16S Illumina data. Based on your suggestion, may I know more details about which kind of formate can I use? Because my data are demultiplexed merged paired ended data. Regard it as “Casava 1.8 single-end demultiplexed fastq”?
And later all my data processing are regarded as single ended? Is that appropriate? I am afraid.
I have tried to import it as single ended. And all the processing are regarded as single ended in DADA2. But the result seems not right.
May I have a way out?
We actually have a bunch of features for joined data which will be in the next release. But basically, yes that will work soon, you’ll just be specifying a slightly different semantic type (SampleData[JoinedSequencesWithQuality]).
DADA2 works best if it can denoise the forward and reverse reads independently before merging. It can kind of work with joined data, but it requires the overlapping quality scores to basically match the profile of the quality scores around it and that really depends on what read-joiner you used. (Then there’s also the typical trimming and removal of non-biological sequence requirements.)
I would recommend waiting for the 2017.11 release where there should be a few other analysis options that you can use for this data.