analysing trimmed data

Hello.
I started to learn microbiome analysis.
For practice, I'd like to analysis 16S sequence data uploaded on database.
This sequence data was obtained by pyrosequencing using 454 GS FLX Titanium (V1-V2 region). This have been already trimmed by quality, primer sequence, and 3000 reads were randomly selected from filete passed reads. There is one fastq file per sample.

In this case, how should I import and analyze?After importing, without trimming, should I classify into OUTs, and then compare the representative sequence of each OUT with the database?

Best regards,

Hello @kopelol,

This is a very broad question. What is your end goal when analyzing this data? If you are just trying to learn how analysis in QIIME 2 works in general, then I would suggest you do some of our tutorials starting with moving pictures if you haven't already.

Should I download R 4.2.0 or newer?

Hello @Oddant1
Thank you for your reply.

I tried to perform analyze using following command.

import

qiime tools import \
  --type SampleData[SequencesWithQuality] \
  --input-path manufest.txt \
  --output-path saliva.qza \
  --input-format SingleEndFastqManifestPhred33

It seemed that this was successfully done.

The FASTQ files appeared to have already undergone preprocessing steps such as demultiplexing and trimming, so I decided to skip these steps and proceed with clustering into ASVs using dada2.

qiime dada2 denoise-single --i-demultiplexed-seqs saliva.qza --p-trunc-len 0 --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2.qza

But, error has occured.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 127), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-0w0v9pki.log

run_dada_paired.R /tmp/qiime2-q2cli-err-0w0v9pki.log

=====================================
R version 4.1.2 (2021-11-01)

Error in args[[2]] : subscript out of bounds

Execution halted

Can you please copy paste the contents of that log file here? Thank you.

In addition, what version of QIIME 2 are you using, and what compute environment are you using?

I'm using q2cli version 2024.5.0 with linux server environment.

Here is the log.

Traceback (most recent call last):
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_single
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 266, in denoise_single
return _denoise_single(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 249, in _denoise_single
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 127), please inspect stdout and stderr to learn more.

(END)

best regards,

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/qiime2/user/data/48471217-f677-489b-ad76-f1ec754da30c/data --output_path /tmp/tmp_9uxokby/output.tsv.biom --output_track /tmp/tmp_9uxokby/track.tsv --filtered_directory /tmp/tmp_9uxokby --truncation_length 0 --trim_left 0 --max_expected_errors 2.0 --truncation_quality_score 2 --max_length Inf --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 1 --learn_min_reads 1000000 --homopolymer_gap_penalty NULL --band_size 16

/home/user/local/lib64/R/bin/exec/R: error while loading shared libraries: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 240, in _denoise_single
run_commands([cmd])
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 37, in run_commands
subprocess.run(cmd, check=True)
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/qiime2/user/data/48471217-f677-489b-ad76-f1ec754da30c/data', '--output_path', '/tmp/tmp_9uxokby/output.tsv.biom', '--output_track', '/tmp/tmp_9uxokby/track.tsv', '--filtered_directory', '/tmp/tmp_9uxokby', '--truncation_length', '0', '--trim_left', '0', '--max_expected_errors', '2.0', '--truncation_quality_score', '2', '--max_length', 'Inf', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '1', '--learn_min_reads', '1000000', '--homopolymer_gap_penalty', 'NULL', '--band_size', '16']' returned non-zero exit status 127.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_single
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 266, in denoise_single
return _denoise_single(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5-2/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 249, in _denoise_single
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 127), please inspect stdout and stderr to learn more.

@kopelol This appears to be an issue we have seen before where there is an R installation outside of your conda environment that is being used inside of your conda environment in place of the one that ships with QIIME 2.

With your conda environment active, can you please run the command which R and post the results here. Can you please also copy paste the contents of your .bashrc file here.

QIIME 2 2024.5 ships with R 4.3.3. It is possible that installing this as your global R installation will resolve this issue for this specific version of QIIME 2, but it will not fix the underlying problem.

Thank you.

Thank you.
It might be because of my server problems.
Recently OS was updated, so I couldn't R.
I tried to fi this problem.

Best regards,

Sorry.
I think problem about R has been solved, but another error was occured.

This is the log.

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/qiime2/user/data/48471217-f677-489b-ad76-f1ec754da30c/data --output_path /tmp/tmpk4tbigwd/output.tsv.biom --output_track /tmp/tmpk4tbigwd/track.tsv --filtered_directory /tmp/tmpk4tbigwd --truncation_length 0 --trim_left 0 --max_expected_errors 2.0 --truncation_quality_score 2 --max_length Inf --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 1 --learn_min_reads 1000000 --homopolymer_gap_penalty NULL --band_size 16

R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.13 / RcppParallel: 5.1.9
2) Filtering .
3) Learning Error Rates
77896 total bases in 195 reads from 1 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
6: stop("Error matrix is NULL.")
5: getErrors(err, enforce = TRUE)
4: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,
selfConsist = TRUE, multithread = multithread, verbose = verbose,
MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
3: learnErrors(filts, nreads = nreads.learn, multithread = multithread,
HOMOPOLYMER_GAP_PENALTY = HOMOPOLYMER_GAP_PENALTY, BAND_SIZE = BAND_SIZE)
2: withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
1: suppressWarnings(learnErrors(filts, nreads = nreads.learn, multithread = multithread,
HOMOPOLYMER_GAP_PENALTY = HOMOPOLYMER_GAP_PENALTY, BAND_SIZE = BAND_SIZE))
Traceback (most recent call last):
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 240, in _denoise_single
run_commands([cmd])
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 37, in run_commands
subprocess.run(cmd, check=True)
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/qiime2/user/data/48471217-f677-489b-ad76-f1ec754da30c/data', '--output_path', '/tmp/tmpk4tbigwd/output.tsv.biom', '--output_track', '/tmp/tmpk4tbigwd/track.tsv', '--filtered_directory', '/tmp/tmpk4tbigwd', '--truncation_length', '0', '--trim_left', '0', '--max_expected_errors', '2.0', '--truncation_quality_score', '2', '--max_length', 'Inf', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '1', '--learn_min_reads', '1000000', '--homopolymer_gap_penalty', 'NULL', '--band_size', '16']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_single
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 266, in denoise_single
return _denoise_single(
File "/home/user/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 249, in _denoise_single
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Hello.

I'd like to analyse 454 sequene data downloaded from SRA.

My question is

  1. Are the quality scores of 454 reads interpreted differently from those of other sequencing platforms?
    According to the paper, the data has been filtered based on quality values, but the FASTQ file still contains reads with low quality.

  2. How should ASVs and OTUs be classified? Additionally, which method would be the most suitable?

3)Is there a way to perform clustering while ignoring the quality score, assuming the data has been filtered?

Here are the details.

Data

Information of the data
-sequencer
454 GS FLX Titanium
-region
16S V1V2
-filterng & trimming
reads with quality value <25 were removed
primer sequence were removed
possibly chimera sequence were removed.
-analysis
3000 reads were randomly selected from filter-passed reads.
-16S
clustering OUT using UCLUST
blast using thieir own database using GLSEARCH

fastq data obtained from SRA was already trimmed and included 3000 reads.

I'd like to analyse this data using Qiime2. But the quality of reads was strange.

inport

qiime tools import \
  --type SampleData[SequencesWithQuality] \
  --input-path manufest.txt \
  --output-path S1.qza \
  --input-format SingleEndFastqManifestPhred33V2

qiime demux summarize \
--i-data S1.qza \
--o-visualization S1.qzv

qiime dada2 denoise-single \
 --i-demultiplexed-seqs S1.qza \
 --p-trim-left 0 \
 --p-trunc-len 0 \
 --o-representative-sequences rep-seqs-dada2.qza \
 --o-table table-dada2.qza \
 --o-denoising-stats stats-dada2.qza

Dada2 error

  1. Learning Error Rates
    77896 total bases in 195 reads from 1 samples will be used for learning the error rates.
    Error rates could not be estimated (this is usually because of very few reads).

Best regards,

1 Like

Here's the core of the error.

Looks like you need more reads!

Are the quality scores of 454 reads interpreted differently from those of other sequencing platforms?

Yes, the q-scores model indels. For Illumina they model SNPs.
And no, because q=20 still means 99% accurate.

dada2 denoise-pyro was built for 454 and IonTorrent data. Try that!

https://docs.qiime2.org/2024.5/plugins/available/dada2/denoise-pyro/

3)Is there a way to perform clustering while ignoring the quality score, assuming the data has been filtered?

Yes... but it's probably better to use the quality scores, and denoise-pyro does just that!

Can you import the raw data into Qiime2 and do the filtering through qiime plugins?
(I know what it's like to not have the raw data, so sometimes this is not possible!)

Thank you for your response. It was very helpful.

Is my understanding correct that denoise-pyro can be used if raw data is available, but not with trimmed reads only? As you mentioned, only the trimmed data was available in the database. I tried it, but an error occurred indicating that the number of reads was too low.

In this case, would it be impossible to perform the analysis using Qiime2? I am also considering trying tools like SILVAngs…

Thank you so much for your reply.

It's always a good idea to try multiple methods and tools!

If you only have trimmed data and that only includes <200 reads... then getting all the raw data becomes really important!

I've merged these threads as I think this is the same data set. (Please correct me is I'm wrong!)

1 Like

An off-topic reply has been split into a new topic: 454 data - Error rates could not be estimated

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.