How to process pair-end demultiplexed seq with qiime2

L_Zh · November 13, 2017, 3:08pm

I got pair-end fastq files from Miseq, 2X300bp to cover V1-V2 region, these samples have been demultiplexed with barcode removed, i.e. each sample has R1 R2 fastq files with v1 primer and reverse v2 primer, but no barcode inside.

How can I import these fastq file without demultiplex again in Qiime2?

In Qiime1, I directly join these pair end files together for each sample, then remove v1 and v2 primer using cutadapt, then use these joined single reads file for downstream analysis. But don't know how to process these demultiplexed samples with qiime2 and data2.

Any suggestion would be great appreciated.

Best

thermokarst · November 13, 2017, 4:29pm

Hi @L_Zh!

I am not 100% following what you have outlined here, so my apologies if I misunderstood! If your primers are on the 5' end of the input sequences, and are all the same length, you can use the --p-trim-left-f/--p-trim-left-r flags to strip the primers out. Otherwise, it sounds like pre-processing with cutadapt will be your best bet at the moment (or processing in Q1, then importing these data into Q2).

Thanks!

L_Zh · November 14, 2017, 5:12pm

Thanks thermokarst:

In my demultiplexed data got from Miseq, we use 27F as v1 primer, and part of 341F as v2 primer. We designed as 2X300bp, but when we got data, the read length varied because the quality cutoff e.g
in R1 (removed Quality line) Bold is revise primer for V2:
@M00800:84:000000000-B94YL:1:1101:19472:1120 1:N:0:27
AGAGTTTGATCCTGGCTCAGTACCTAGAAAGATGGTTATTTTGTTTTGAGTGTTTTGCAGACTAAGCCATGTAGGTATGTACGTAGAACTCAGCACTGGTAGGTAGCTTGATGCAGCCCAGGGGCAAGGTAGATCCTACTTTGAAGACTCCTACGGGAGGCAGCAGTGTCGTGACTGGGAAAACCCTG

In R2 (removed Quality line), Bold is primer in V1:
@M00800:84:000000000-B94YL:1:1101:16866:1077 2:N:0:27
NAGGGTTTTCCCAGTCACGACATTGCTGCGTCCCGTAGGAGTCTCCAAGACGGCATAGATTCAGCACGCCTAAGAACAAACCCTGGAAACCGTGTCCCCGGCCCCATGAATGTCAATACATTCCCCCCTCACCCACCCCTGAGCCATGACCAAACCCTACATCGGAAGAGCGTCGCGTCGGGCAAGCGTGTAGACCTCCGTGGCCGCCGCATCACTAAAAAAAGCACCCCCTCATCGCGCACCCACCCACACCACTGCTAGCCCCTATCCCTCGCCTTCACACCCCCCCCTCCACACACCC

In Qiime1, I can join these paired end read together, then use cutadapt to remove v1, v2 primer for these joined reads for downstream analysis.

But for Qiime2, do we need remove these primer first to import, or should i used joined and primer removed reads from Qiime 1 for downstream analysis in Qiime2? Is there a way to directly imported these paired end read like the "Casava 1.8 paired-end demultiplexed fastq" example?

Thanks

Best

LZ

Nicholas_Bokulich · November 15, 2017, 12:22am

Hey @L_Zh,

So the sequencing core is performing some sort of quality filtering and trimming prior to handing you the data?

Based on that R1 sequence, it looks like the issue is that you have some reverse primers contained within the sequences. Is that correct? And due to the quality trimming these primers do not always appear at the same position.

If that's the case, you should probably use external tools to join reads and trim out primers prior to importing to QIIME2. We hope to support both read joining and cutadapt trimming soon, but these functions are not yet available in QIIME2.

Once you have joined/trimmed reads, you should be able to import these reads into QIIME2. Since you already demultiplexed these reads, you will probably want to use a fastq manifest file to import these files. Instructions in that tutorial will guide you through those steps.

I hope that helps!

L_Zh · November 20, 2017, 3:49pm

Thanks Nicholas

Since the data i got have various bp length reads due to the quality cut. I have asked the raw bcl file from Miseq, do you know how can I generate Qiime2 format fastq file from the bcl raw data?

Nicholas_Bokulich · November 21, 2017, 12:12am

Hi @L_Zh,
Your sequence center would probably have a better idea about processing bcl files — but I don't think that will even be necessary. I think I have a better idea of your original question now — you can still use cutadapt to trim primers form your reads, then import as described here. No joining necessary — joining will be performed by dada2.

Otherwise, googling tells me that Illumina provides a bcl2fastq script to perform bcl conversion to fastq.

I hope that answers your question!