analyzing 454 data in QIIME 2

gregcaporaso · December 19, 2022, 9:16pm

The 2022.11 release of QIIME 2 adds support for the fasta/qual formats used in 454 data. 454 sequencing technologies are outdated, but a lot of useful data still exists out there in the wild. This tutorial illustrates how 454 sequencing data can be analyzed in QIIME 2 versions 2022.11 and later.

This example uses the QIIME 1 454 tutorial data.

Using the QIIME 1 454 Overview Tutorial data

Access data from ftp://ftp.microbio.me/pub/qiime-files/qiime_overview_tutorial.zip. Move the .fna (fasta) and .qual files from that zip file to a new q1-fastaqual directory and name them reads.fasta and reads.qual, respectively.

$ ls q1-fastaqual/
reads.fasta reads.qual

Import the multiplexed fasta/qual data:

qiime tools import 
  --type MultiplexedSingleEndBarcodeInSequence 
  --input-format MultiplexedFastaQualDirFmt 
  --input-path q1-fastaqual 
  --output-path seqs.qza

Demultiplex the sequences using q2-cutadapt:

qiime cutadapt demux-single 
  --i-seqs seqs.qza 
  --m-barcodes-file Fasting_Map.txt 
  --m-barcodes-column BarcodeSequence 
  --o-per-sample-sequences demux.qza 
  --o-untrimmed-sequences unassigned.qza

After demultiplexing, we need to trim sequences to the region framed by the forward and reverse primers. This is done using q2-cutadapt's trim-single method, with the trimming defined by the --p-adapter parameter. Note the beginning of the sequence provided for this parameter must be at the beginning of a sequence read for a match to occur. The end of the adapter must be in the orientation that it would have been read in, so perhaps the reverse complement of the reverse primer sequence that you used (but this will vary based on how this information is recorded). The start of the adapter sequence here is often the primer just following the barcode sequence, so depending on your barcoding protocol, this may be the forward or the reverse primer.

Important: This trimming is critical in 454 workflows, where it is not uncommon to sequence through the target region and into (non-biological) adapter sequence.

Important: Be sure to include the --p-discard-untrimmed parameter - this is important for quality control, and for making it very obvious if your adapter sequence is wrong and you haven't trimmed anything.

Important: For 454 workflows, I recommend always using the linked primer syntax for --p-adapter, illustrated here, which requires a match at the start of a sequence to the primer, and optionally trims at the start of the reverse primer (after the ...) if the reverse primer is encountered.

Getting your sequences oriented correctly for the --p-adapter parameter can take some experimentation. If you don't include --p-discard-untrimmed you won't know if something went wrong.

qiime cutadapt trim-single 
  --i-demultiplexed-sequences demux.qza 
  --o-trimmed-sequences demux-trimmed.qza 
  --p-match-adapter-wildcards 
  --p-adapter ^YATGCTGCCTCCCGTAGGAGT...TACTCACCCGTGCGC
  --p-discard-untrimmed

Generate a visual summary of the demultiplexed data.

qiime demux summarize 
  --i-data demux-trimmed.qza 
  --o-visualization demux-trimmed.qzv

Perform sequence quality control with DADA2's denoise-pyro functionality (pyro is short for pyrosequencing here). You should choose your setting for --p-trunc-len in the same way that you would for Illumina sequencing data (refer to the QIIME 2 Moving Pictures tutorial for a discussion of this).

qiime dada2 denoise-pyro 
  --i-demultiplexed-seqs demux-trimmed.qza 
  --p-trunc-len 212 
  --output-dir dada2-out

It was common to define OTUs based on some percent sequence identity with 454 data, but this is not essential. If you want to define OTUs on these data, refer to QIIME 2's OTU Clustering tutorial.

At this stage, your data can mostly be analyzed as usual. Some examples of things you might do follow.

Summarize the feature table.

qiime feature-table summarize 
  --i-table dada2-out/table.qza 
  --o-visualization dada2-out/table.qzv 
  --m-sample-metadata-file Fasting_Map.txt

Build a phylogenetic tree representing your sequences:

qiime phylogeny align-to-tree-mafft-fasttree 
  --i-sequences dada2-out/representative_sequences.qza 
  --output-dir dada2-out/tree

Perform microbiome diversity analyses:

qiime diversity core-metrics-phylogenetic 
  --i-phylogeny dada2-out/tree/rooted_tree.qza 
  --i-table dada2-out/table.qza 
  --p-sampling-depth 87 
  --m-metadata-file Fasting_Map.txt 
  --output-dir dada2-out/cmp87

Assign and visualize taxonomy:

qiime feature-classifier classify-sklearn 
  --i-classifier ../gg-13-8-99-nb-weighted-classifier.qza 
  --i-reads dada2-out/representative_sequences.qza 
  --o-classification dada2-out/taxonomy.qza

qiime taxa barplot 
  --i-table dada2-out/table.qza 
  --i-taxonomy dada2-out/taxonomy.qza 
  --m-metadata-file Fasting_Map.txt 
  --o-visualization dada2-out/taxa-barplot.qzv

A quick comparison of a few of the results to the QIIME 1 454 Tutorial suggest that the workflow above is working as expected. For example:
Screen Shot 2022-10-21 at 1 22 31 PM

One difference I noticed with this data set is that read counts post-denoising are lower in QIIME 2 than in QIIME 1. This is probably a result of the improved quality control in QIIME 2, which may be discarding more problematic sequences.