Import multiplexed fastq file

Sequencing type: Illumina MiSeq

From our sequencing facility we have an .fna and a .qual file, described as follows:

“The .fna file contains your sequence data after it has undergone quality trimming, denoising and quality checking. Each sequence in this file has had an 8 nucleotide barcode prepended to the front of it. This barcode is not the barcode that was originally used during sequencing, instead we use a faux barcode for each sample to ensure that they each have a unique barcode.”

“The .qual file contains the quality scores for your sequence data after it has undergone quality trimming, denoising and quality checking. Each quality score set in this file has had 8 fake 40 scores prepended to the front of it to account for the faux barcode added to the sequence file.”

I have converted these two files into fastq format using QIIME 1. While I can do my analysis in QIIME 1, my hope was to be able to demultiplex and quality filter this fastq file in QIIME 2. But I cannot get it imported.

tl;dr I need to import a multiplexed joined-paired end Illumina Miseq fastq file into QIIME 2.

Any help is much appreciated,
Kristopher

Hi @kparke10!

We'll need to extract the barcodes into a separate FASTQ file, and then demultiplex the two FASTQ files (sequences & barcodes) using demux emp-single (or demux emp-paired; see below). We don't have a direct way to extract barcodes in the current QIIME 2 release, but the next release (2017.12) will be live in a few days, with support for demultiplexing your data directly with cutadapt. We'll follow up here when that feature is available in the release!

In the meantime, you can use QIIME 1's extract_barcodes.py script to extract the barcodes into a separate fastq file, and then import the fastq files as EMPSingleEndSequences or EMPPairedEndSequences. The imported data can then be demultiplexed with demux emp-single or demux emp-paired. Alternatively, you could use cutadapt directly to demultiplex the data, and then import the results.

Have the reads in the .fna file already been joined (e.g. by the sequencing center)? If you plan to use DADA2 to denoise your data, it'd be better to obtain the unjoined paired end sequences if possible, and import those to use with dada2 denoise-paired. If you're using Deblur or one of the vsearch OTU picking methods, importing the joined sequences is the way to go.

I'd also recommend checking whether the primers and other sequencing artifacts (e.g. adapters) have been removed from your sequences; that's an important preprocessing step before denoising your sequences.

1 Like

The reads in the .fna file are pre-joined by the facility. They do not contain the primers or adapters, and they have faux barcodes appended to them that are present in my mapping file for demultiplexing. They are also chimera checked, denoised, and quality checked (but not filtered).

I have the raw fastq paired end reads (unjoined) for each sample. I just haven’t played around with them yet. That’s on my to-do list.

I was able to get the data into QIIME 2. I demultiplexed (but not quality filtered) using QIIME 1’s demultiplex_fasta.py script. I converted to fastq and filtered each sample individually. After creating a manifest, I got everything loaded into QIIME 2. Type was: SampleData[JoinedSequencesWithQuality] and format was: SingleEndFastqManifestPhred33. Everything uploaded fine and the analysis is proceeding without any hurdles. I used deblur, following similar approaches to the forum post here and the Moving Pictures Tutorial.

In the meantime, I will take your advice so I can enter into QIIME 2 from the beginning.

Thank you!

*First edit was to finish the post and the second edit was to add this note.

1 Like

QIIME 2 2017.12 is now out, and it includes a new plugin wrapping cutadapt — you can use this plugin to demultiplex sequences with barcodes still in the sequence data. Give it a try and let us know how it goes!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.