Demultiplex the merged sequences with sample name in the header

Lu_Yang · November 20, 2017, 3:40am

Hi,
I have a merged fastq file, with sample name in the head, as below. The sample name in the header is the only identifier to distinguish the sequences. May I know how to demultiplex this in qiime2 and import it into qiime2? I want to use DADA2 to pick rep_seqs later. Thanks in advance.

@M01056:153:000000000-AFD83:1:1101:16123:1874--Sample1
TACGTAGGGGGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCACGTAGGCGGTCCTTCAAGTCGGAAGTGAAATCTCAAGGCTCAACCTTGAAATTGCTTTCGATACTGGGGGACTTGAGGCAGGTAGGGGAGTGTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCAGTGGCGAAGGCGGCACTCTGGGCCTGTACTGACGCTTAGGTGCGAAAGCGTGGGGAGCAAACAGG
+
GGHHDGHGGGGGGGGGGEEGFHHGGGGGHHHHHHHHGEEFHHHHEGHGGHFGHGGGGGHHHHHHFHGGCGGH7HHHHHGGHHHHHHGGHHHHGEGHHHGGHHHEHGGFHGGHCC?DGGHGGGGGGGHHGFFGGGHFFHHHHFGHGGHHFHFCEEDFFHHFGGGGGFDFFHFHGFGFGHGGGD0FFGHHGGGDHHCGEEEF4HGEG1E?FGGFGFD@EGDGGGFF3FGGGGGHCGEEE?HGGGHGHHHFGGBGB
@M01056:153:000000000-AFD83:1:1101:13352:1907--Sample2
TACGTAGGGTGCAAGCATTATCCGGATTTATTGGGCGTAAAGCGTCCGTCGGCGTTTTATCAAGTTTTGACTTTAATACTGGAGCTTAACTCCAGCTACAGGTTGAAATACTGATAGAATTGAGTTTACTAGGGGGAGCTGGAATTCTCGGTGGAGGAGTGAAATCCGTAGATATCGAGAGGAACACCATTCGCGAAGGCGGGCTCCTGGAGTATAACTGACGCTCAGGGACGAAAGTTTGGGTAGCAAAAGGG
+
GGHHHHHGGHGGHHHHHHHHHHHGGGGGHHHHHHHHGGGGHHHGGGGGHH;GGGGGGGHHHGHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHGHHHGHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHHGHHHHHHHHHHHGGGGGHGHHGHGGGHHHHHHGGFGHGGGGFGHHGGGGGGGHFHHGHHHHHHHHHHGGGGGGHHGHGFGHGHHHHHHHHHGHGHHHHHHHHHH

ebolyen · November 22, 2017, 8:56pm

Hey @Lu_Yang,

Sorry for the delayed response and thanks for the sample reads!

Based on those samples, it looks like the sample ID is delimited with a -- in the header? We don't have anything in QIIME 2 that recognizes that, or anything that can demultiplex based on matching something in a FASTQ header.

Where did you get your data from? Does it exist in a different format by any chance?

Thanks!

Lu_Yang · November 22, 2017, 9:17pm

Hi, @ebolyen,
Happy Thanks Giving! Thanks for your reply!
This is the data merged from FLASH. I do not have the original samples. During these days, I have demultiplexed all these samples now. So now merged reads for each samples are in separate fastq.gz/ fastq files.
May I know is there any way to import these demultiplexed samples and later deal with DADA2 or Deblur? And is it appropriate to deal with QIIME2?
Again, thanks.

ebolyen · November 22, 2017, 9:46pm

You too!

Perfect! You should be able to follow this section of the Importing Tutorial to get your data into a QIIME 2 artifact!

What sequencing technology did you use? We've mostly focused on Illumina amplicon sequencing at this point, but we're looking to add more types of analysis to QIIME 2 in the nearish future.

Let me know if that helps!

Lu_Yang · November 22, 2017, 9:55pm

Hi, @ebolyen,
Thanks for your quick response.
My samples are 16S Illumina data. Based on your suggestion, may I know more details about which kind of formate can I use? Because my data are demultiplexed merged paired ended data. Regard it as "Casava 1.8 single-end demultiplexed fastq"?
And later all my data processing are regarded as single ended? Is that appropriate? I am afraid.
I have tried to import it as single ended. And all the processing are regarded as single ended in DADA2. But the result seems not right.
May I have a way out?

Thanks!

ebolyen · November 22, 2017, 10:05pm

Hey @Lu_Yang,

We actually have a bunch of features for joined data which will be in the next release. But basically, yes that will work soon, you'll just be specifying a slightly different semantic type (SampleData[JoinedSequencesWithQuality]).

DADA2 works best if it can denoise the forward and reverse reads independently before merging. It can kind of work with joined data, but it requires the overlapping quality scores to basically match the profile of the quality scores around it and that really depends on what read-joiner you used. (Then there's also the typical trimming and removal of non-biological sequence requirements.)

I would recommend waiting for the 2017.11 release where there should be a few other analysis options that you can use for this data.

Lu_Yang · November 22, 2017, 10:08pm

Hi, @ebolyen,

WOW! Cool! Looking forward to that!
Thanks for your info. I will wait for the new release.
Have a great holiday!

jairideout · November 30, 2017, 7:06pm

The QIIME 2 2017.11 release has expanded support for analyzing paired end reads! See the paired end reads community tutorial for more details.