The 2022.11 release of QIIME 2 adds support for the fasta/qual formats used in 454 data. 454 sequencing technologies are outdated, but a lot of useful data still exists out there in the wild. This tutorial illustrates how 454 sequencing data can be analyzed in QIIME 2 versions 2022.11 and later.
This example uses the QIIME 1 454 tutorial data.
Using the QIIME 1 454 Overview Tutorial data
Access data from ftp://ftp.microbio.me/pub/qiime-files/qiime_overview_tutorial.zip. Move the .fna
(fasta) and .qual
files from that zip file to a new q1-fastaqual
directory and name them reads.fasta
and reads.qual
, respectively.
$ ls q1-fastaqual/
reads.fasta reads.qual
Import the multiplexed fasta/qual data:
qiime tools import
--type MultiplexedSingleEndBarcodeInSequence
--input-format MultiplexedFastaQualDirFmt
--input-path q1-fastaqual
--output-path seqs.qza
Demultiplex the sequences using q2-cutadapt
:
qiime cutadapt demux-single
--i-seqs seqs.qza
--m-barcodes-file Fasting_Map.txt
--m-barcodes-column BarcodeSequence
--o-per-sample-sequences demux.qza
--o-untrimmed-sequences unassigned.qza
After demultiplexing, we need to trim sequences to the region framed by the forward and reverse primers. This is done using q2-cutadapt
's trim-single
method, with the trimming defined by the --p-adapter
parameter. Note the beginning of the sequence provided for this parameter must be at the beginning of a sequence read for a match to occur. The end of the adapter must be in the orientation that it would have been read in, so perhaps the reverse complement of the reverse primer sequence that you used (but this will vary based on how this information is recorded). The start of the adapter sequence here is often the primer just following the barcode sequence, so depending on your barcoding protocol, this may be the forward or the reverse primer.
Important: This trimming is critical in 454 workflows, where it is not uncommon to sequence through the target region and into (non-biological) adapter sequence.
Important: Be sure to include the --p-discard-untrimmed
parameter - this is important for quality control, and for making it very obvious if your adapter sequence is wrong and you haven't trimmed anything.
Important: For 454 workflows, I recommend always using the linked primer syntax for --p-adapter
, illustrated here, which requires a match at the start of a sequence to the primer, and optionally trims at the start of the reverse primer (after the ...
) if the reverse primer is encountered.
Getting your sequences oriented correctly for the --p-adapter
parameter can take some experimentation. If you don't include --p-discard-untrimmed
you won't know if something went wrong.
qiime cutadapt trim-single
--i-demultiplexed-sequences demux.qza
--o-trimmed-sequences demux-trimmed.qza
--p-match-adapter-wildcards
--p-adapter ^YATGCTGCCTCCCGTAGGAGT...TACTCACCCGTGCGC
--p-discard-untrimmed
Generate a visual summary of the demultiplexed data.
qiime demux summarize
--i-data demux-trimmed.qza
--o-visualization demux-trimmed.qzv
Perform sequence quality control with DADA2
's denoise-pyro
functionality (pyro
is short for pyrosequencing
here). You should choose your setting for --p-trunc-len
in the same way that you would for Illumina sequencing data (refer to the QIIME 2 Moving Pictures tutorial for a discussion of this).
qiime dada2 denoise-pyro
--i-demultiplexed-seqs demux-trimmed.qza
--p-trunc-len 212
--output-dir dada2-out
It was common to define OTUs based on some percent sequence identity with 454 data, but this is not essential. If you want to define OTUs on these data, refer to QIIME 2's OTU Clustering tutorial.
At this stage, your data can mostly be analyzed as usual. Some examples of things you might do follow.
Summarize the feature table.
qiime feature-table summarize
--i-table dada2-out/table.qza
--o-visualization dada2-out/table.qzv
--m-sample-metadata-file Fasting_Map.txt
Build a phylogenetic tree representing your sequences:
qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences dada2-out/representative_sequences.qza
--output-dir dada2-out/tree
Perform microbiome diversity analyses:
qiime diversity core-metrics-phylogenetic
--i-phylogeny dada2-out/tree/rooted_tree.qza
--i-table dada2-out/table.qza
--p-sampling-depth 87
--m-metadata-file Fasting_Map.txt
--output-dir dada2-out/cmp87
Assign and visualize taxonomy:
qiime feature-classifier classify-sklearn
--i-classifier ../gg-13-8-99-nb-weighted-classifier.qza
--i-reads dada2-out/representative_sequences.qza
--o-classification dada2-out/taxonomy.qza
qiime taxa barplot
--i-table dada2-out/table.qza
--i-taxonomy dada2-out/taxonomy.qza
--m-metadata-file Fasting_Map.txt
--o-visualization dada2-out/taxa-barplot.qzv
A quick comparison of a few of the results to the QIIME 1 454 Tutorial suggest that the workflow above is working as expected. For example:
One difference I noticed with this data set is that read counts post-denoising are lower in QIIME 2 than in QIIME 1. This is probably a result of the improved quality control in QIIME 2, which may be discarding more problematic sequences.