PRONAME: Enhancing Taxonomic Accuracy with Nanopore Long-Read Metabarcoding and QIIME2

Hi everyone!

Here is a post for those of you interested in combining the strengths of both QIIME2 and Nanopore long-read metabarcoding :raised_hands:.

As you may know, the sequencing technology developed by Oxford Nanopore Technologies (ONT) enables the generation of (very) long reads, which can significantly enhance the taxonomic resolution of metabarcoding approaches :mag_right:. For example, the entire 16S-ITS-23S operon region can be sequenced in one go. However, Nanopore sequencing has historically faced challenges due to its higher error rate compared to Illumina sequencing. The good news is that the successive developments brought by ONT contributed to an increase in sequencing accuracy :star_struck:.

Because Nanopore sequencing has unique error profiles, Illumina-dedicated tools for error correction, such as DADA2 and Deblur, cannot be directly applied to Nanopore metabarcoding data.

Convinced of the huge potential of Nanopore long-read metabarcoding, we developed PRONAME (PROcessing NAnopore MEtabarcoding data), a user-friendly pipeline fully compatible with QIIME2 :qiime2:. Users can generate analysis results as QIIME2 artifacts (qza/qzv), a phyloseq object, or simple TSV/FASTA files. For instance, the rep_seqs.qza and rep_table.qza files generated by PRONAME are FeatureData[Sequence] and FeatureTable[Frequency] QIIME 2 artifact, respectively. They can therefore be directly used into QIIME2 for downstream analyses like alpha-/beta-diverty analysis and differential abundance testing, among others.

Before further explanations, here are the links to the PRONAME GitHub and publication pages if you are interested.

The PRONAME pipeline consists of four scripts for step-by-step processing of raw sequencing data:

proname_import: Import and visualization of raw sequencing data (+ primer and adapter trimming),
proname_filter: Quality-filtering of data and retaining of only duplex and/or simplex reads,
proname_refine: Significant improvement of read accuracy by generating error-corrected consensus sequences and removing chimera,
proname_taxonomy: Taxonomic analysis.

We won't go too much into details here as you can find in our GitHub repo detailed explanations of each script, available arguments and their function, and an extensive tutorial showcasing the use of PRONAME to process real-life sequencing data, as well as downstream applications using QIIME2.

One of PRONAMEโ€™s major strengths: Enhanced sequence accuracy

The pipeline significantly improves sequence accuracy, as shown below:


Figure from Dubois et al. (2024): Read mean accuracy reached at different steps of the PRONAME workflow. Accuracy was computed for reads before their import into PRONAME (Raw reads), reads generated by proname_import (Trimmed reads) and proname_filter (HQ reads) and error-corrected sequences generated by proname_refine (Consensus sequences). The libraries were sequenced either with new V14 chemistry (R10.4.1 flowcells, producing both simplex and duplex reads) or with the older chemistry (R9.4.1 flowcells, generating only simplex reads).

Sequence accuracy increases throughout the workflow, reaching 99.5% with standard settings and 99.7% with optimized settings :rocket:. These numbers are expected to have improved even more by now, thanks to the recent introduction of a new basecalling model (v5.0.0) and the new version of the companion module for library prep :sparkles:.

Pipeline distribution and databases included in PRONAME

The PRONAME pipeline is provided as a Docker image which simply needs to be pulled from Docker Hub to be directly useable, without installation and with all dependencies and databases available. Among the databases coming with PRONAME, you can find Silva 138 and Greengenes2 for full 16S metabarcoding analyses. As long-read sequencing allows sequencing the entire 16S-ITS-23S region, we developed the rEGEN-B (rrn operons Extracted from GENomes of Bacteria) database, which is also included in the Docker image. Alternatively, it is directly available on FigShare if needed.
Any other reference database can be provided by the user, therefore allowing the processing of data coming from :microbe:, :mushroom:, :ear_of_rice:, :honeybee:, and :fish: among others (actually any domain of life).


So, here are the key characteristics of PRONAME, I hope the pipeline proves useful to some of you :slightly_smiling_face:. One final note: I donโ€™t see long-read metabarcoding as a replacement for Illumina sequencing. Instead, both approaches have unique strengths and should be chosen based on research goals and context :compass:.

7 Likes