Incorporation of long-read 16s rRNA sequences into QIIME2 and ASV generation

Hi all,

I am trying to incorporate some complete/near complete 16s rRNA sequences that I have extracted from annotated genomes following some PacBio HiFi metagenomic sequencing I have done. The aim here is to compare these sequences against a V1-V2 16s rRNA sequencing dataset, generating a phylogenetic tree to see the evolutionary relatedness of the V1-V2 ASVs to the PacBio 16s sequences.

I have imported the PacBio 16s rRNA sequences using the following command, with each sample being represented by a single FASTQ file containing a single 16s sequence:

qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path SingleEndManifestPacBio.csv
--output-path single-end-demux.qza
--input-format SingleEndFastqManifestPhred33

I now want to generate ASVs from these sequences (to compare with the V1-V2 dataset), however, as each sample (sequence) only consists of one read, I receive the following error:

qiime dada2 denoise-single
--i-demultiplexed-seqs ./single-end-demux.qza
--p-trunc-len 0
--o-representative-sequences ./pacbio-rep-seqs-dada2.qza
--o-table ./pacbio-table-dada2.qza
--o-denoising-stats ./pacbio-dada2-stats.qza
--verbose

R version 4.2.3 (2023-03-15)
Loading required package: Rcpp
DADA2: 1.26.0 / Rcpp: 1.0.10 / RcppParallel: 5.1.6
2) Filtering .......................................................................................................................................................................................................................................................................................................................................................................................................................................
3) Learning Error Rates
646684 total bases in 423 reads from 423 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
6: stop("Error matrix is NULL.")
5: getErrors(err, enforce = TRUE)
4: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,
selfConsist = TRUE, multithread = multithread, verbose = verbose,
MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
3: learnErrors(filts, nreads = nreads.learn, multithread = multithread,
HOMOPOLYMER_GAP_PENALTY = HOMOPOLYMER_GAP_PENALTY, BAND_SIZE = BAND_SIZE)
2: withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
1: suppressWarnings(learnErrors(filts, nreads = nreads.learn, multithread = multithread,
HOMOPOLYMER_GAP_PENALTY = HOMOPOLYMER_GAP_PENALTY, BAND_SIZE = BAND_SIZE))
Traceback (most recent call last):
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 220, in _denoise_single
run_commands([cmd])
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/qiime2/rsoftley/data/fe0fae4d-d4d5-41ed-a486-b8c2b7f1ea19/data', '--output_path', '/tmp/tmpfj8tqof8/output.tsv.biom', '--output_track', '/tmp/tmpfj8tqof8/track.tsv', '--filtered_directory', '/tmp/tmpfj8tqof8', '--truncation_length', '0', '--trim_left', '0', '--max_expected_errors', '2.0', '--truncation_quality_score', '2', '--max_length', 'Inf', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '1', '--learn_min_reads', '1000000', '--homopolymer_gap_penalty', 'NULL', '--band_size', '16']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in call
results = action(**arguments)
File "", line 2, in denoise_single
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
outputs = self.callable_executor(
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 509, in callable_executor
output_views = self._callable(**view_args)
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 244, in denoise_single
return _denoise_single(
File "/pub59/rsoftley/anaconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 229, in _denoise_single
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

Can anyone suggest a work around or alternative method for this type of analysis?

Good morning and welcome to the forums! :qiime2:

This sounds like a neat project! Multi-region sequencing is always a fun challenge.

dada2 denoise-single is built for Illumina data, so try out dada2 denoise-ccs instead:
https://docs.qiime2.org/2023.7/plugins/available/dada2/denoise-ccs/

Hopefully that will 'Just Work :tm:' but let us know either way.

Hi colinbrislawn,

Thanks for the quick response - I was unaware of this functionality!

I'll give it a go and report back!

1 Like

So here's what I've found...

dada2 denoise-ccs requires the following options:

--i-demultiplexed-seqs # input sequences
--p-front TEXT # the 5' adapter
--p-adapter TEXT # the 3' adapter

My input sequences consist of 16s rRNA sequences that I have extracted from metagenomic assemblies after binning and annotation. So, they do not contain any primers or adapters, as I have identified the 16s rRNA genes with Prokka, extracted the sequences and saved them as individual .fastq files. For that reason, I'm not sure this approach will work (in its current form at least).

I also fear that, because each .fastq file contains only one sequence, DADA2 will struggle to estimate error rates as it interprets this as only one read per sample.

I hope that makes sense, I understand that this is an unorthodox use case for QIIME2.

1 Like

Yeah, I think that's the main issue. DADA2 is designed to run on raw (or mostly raw) reads, not highly processed ones...


Perhaps, lol!

As I think about this post, I wonder, what are you calling these features?

after [assembling], binning, annotating, and extracting from metagenomes

Assembly takes care of dereplication and noise reduction to make Sequence Variants.
The annotation and extraction selects only Amplicons.

Are these already ASVs?

:thinking: :joy_cat:

1 Like

I think you may be correct - I’ve had ASVs this whole time!

Perhaps it’s better to generate a .biom file with the sequence and taxa information and import that way? I think I’ve seen a thread on here just for that.

Thanks for your help on this :joy:

1 Like

For anyone interested, the solution was to treat the extracted 16s rRNA sequences as ASVs (as they have already been QC'd and de-replicated during assembly, binning and annotation).

I obtained individual 16s rRNA sequences after extracting them from the genbank files generated during annotation, saved each one as an individual .fasta, merged them into a single .fasta to import as follows:

Concatenate individual 16s sequences into a single fasta

cat *.fasta > allpacbio_16s.fasta

import the fasta as ASV sequence data

qiime tools import
--type 'FeatureData[Sequence]'
--input-path allpacbio_16s.fasta
--output-path pacbio_sequences.qza

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.