How to import demultiplexed fastq files without barcode?

puritan · September 11, 2018, 2:56am

Hi all, i got some fastq files from Illumina miseq, which are demultiplexed already (one fastq per sample). The barcodes and primers are already stripped from the sequences when i got them, so that extract_barcodes.py won’t help. I have renamed the headers of each fastq and performed cat *fastq > all.fastq. So how can i import these demultiplexed fastq into QIIME2?

PS. In fact these fastq files are sharing one same barcode since each fastq consists an library when sequencing, not in the conventional way that many amplicons of different samples are merged together as a big library and each has unique barcode to identify them.

ebolyen · September 11, 2018, 5:58pm

Hey @puritan!

Undo this step, QIIME 2 works really well with demultiplexed data (unlike QIIME 1).

Since it was provided to you already demultiplexed, you can probably use the CasavaOneEightSingleLanePerSampleDirFmt to import.

Otherwise, there's always the FASTQ Manifest formats.

Are you saying that each sample got its own lane in the sequencer?

puritan · September 12, 2018, 5:02pm

Hi @ebolyen, one more question please.
How could i export denoised reads in fasta format? i am not referring to the representative seqs, but all the qualified reads after DADA2.

Actually i am wondering if there is such a fasta file, since the denoising step by DADA2 has 3 output files: the Feature table, the rep_seq, and the dada2-stat, none contains all the denoised seqs i want.

I would like to cluster these denoised seqs against greengenes or silva for some function predictions e.g. PICRUSt. Can i do that? thanks

ebolyen · September 12, 2018, 11:44pm

Hey @puritan,

There is and there isn't. That information is available if you combine the feature-table (which has the number of times a feature was found for each sample) with the rep-seqs (which are just the denoised reads with their corresponding feature ID).

You can use qiime vsearch cluster-features-closed-reference for this. It will take your feature-table and rep-seqs to generate new features based on some database's OTUs instead of ASVs. Even better you'll also have the unmatched ASVs which you can continue to use as plain ASVs.

This tutorial should get you started!

system · October 14, 2018, 5:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.