Demultiplexing and Trimming Adapters from Reads with q2-cutadapt

:exclamation: :exclamation: :exclamation: NOTE :exclamation: :exclamation: :exclamation:

This tutorial is a work in progress, and is incomplete at the moment. It demonstrates at a high level some of the methods available in the q2-cutadapt plugin available in QIIME 2 2018.2. Please stay tuned here for additional updates as this tutorial is expanded upon in the coming weeks.


Multiplexed reads with the barcodes in the sequence reads can be demultiplexed in QIIME 2 using the q2-cutadapt plugin, which wraps the cutadapt tool. (Multiplexed sequences prepared with the EMP protocol, where barcode reads are in a separate file, as always can be demultiplexed with the q2-demux plugin.) The following tutorial utilizes a toy dataset to illustrate some of the methods in q2-cutadapt.

Download data used in this tutorial

forward.fastq.gz (770 Bytes)
metadata.tsv (53 Bytes)

The data here consists of single-end reads (6 reads total). There are two samples present in the data, with the following barcodes on the 5' end:

Sample    Barcode
------------------
Sample_A  AACCGGTT
Sample_B  CCAAGGTT

Import the multiplexed sequences

$ qiime tools import \
  --type MultiplexedSingleEndBarcodeInSequence \
  --input-path forward.fastq.gz \
  --output-path multiplexed-seqs.qza

Demultiplex the reads

$ qiime cutadapt demux-single \
  --i-seqs multiplexed-seqs.qza \
  --m-barcodes-file metadata.tsv \
  --m-barcodes-column Barcode \
  --p-error-rate 0 \
  --o-per-sample-sequences demultiplexed-seqs.qza \
  --o-untrimmed-sequences untrimmed.qza \
  --verbose

Trim adapters from demultiplexed reads

If there are sequencing adapters or PCR primers in the reads which you'd like to remove, you can do that next as follows.

$ qiime cutadapt trim-single \
  --i-demultiplexed-sequences demultiplexed-seqs.qza \
  --p-front GCTACGGGGGG \
  --p-error-rate 0 \
  --o-trimmed-sequences trimmed-seqs.qza \
  --verbose

Summarize demultiplexed and trimmed reads

$ qiime demux summarize \
  --i-data trimmed-seqs.qza \
  --o-visualization trimmed-seqs.qzv
$ qiime tools view trimmed-seqs.qzv

Regarding paired-end reads

  • The import format for paired-end reads with the barcodes still in the sequence is MultiplexedPairedEndBarcodeInSequence - this format expects two files in a directory (forward.fastq.gz and reverse.fastq.gz).
  • Demultiplexing currently only works if the barcodes are in the forward reads --- we plan to support dual-indexing strategies in a future release of QIIME 2.
  • Demultiplexing is accomplished with the demux-paired command.
  • Filtering/trimming is accomplished with the trim-paired command.
14 Likes