Getting low feature counts while using dada2 plugin

Rishikesh12 · May 2, 2023, 9:28am

Q1 - I am experiencing significant loss of reads during the quality filtering step when using the dada2 plugin. Could someone please provide guidance on how to address this issue?

Q2 - Is it possible to use the Cutadapt plugin to trim any forward and reverse reads with a quality score below q20?

Thank You
Rishi

colinvwood · May 2, 2023, 4:18pm

Hello @Rishikesh12,

Q1: To be able to help you here it would be best if you could provide the commands you ran and, if possible, your input and output artifacts.

Q2: Yes this is possible with the --p-quality-cutoff-5end and --p-quality-cutoff-3end flags to the qiime cutadapt command. However, dada2 also supports this, so check some of the options available to that command if you're interested, and you can do it in the same step.

Rishikesh12 · May 3, 2023, 6:03am

Q1 : Here I am attaching my denois stats file and quality plot files and commands that I have used for getting feature table using dada2.

command I have used :

qiime dada2 denoise-paired \

--i-demultiplexed-seqs demux.qza \

--p-trunc-len-f 277 \

--p-trunc-len-r 198 \

--p-trunc-q 20 \

--o-table table.qza \

--o-representative-sequences rep-seqs.qza \

--o-denoising-stats denoising-stats.qza

Q2 : As you can see in the above command we have to mention --p-trunc-len-f and --p-trunc-len-r both along with --p-trunc-q. But I want trim all low quality reads below quality threshold 20. How to do it that?

Thank You
Rishi

colinvwood · May 3, 2023, 6:12pm

Hi @Rishikesh12,

Those parameters look reasonable to me, it's strange that so many are getting filtered.

By passing --p-trunc-q 20 you are trimming all reads with bases below a quality score of 20, so don't worry about that. In fact, passing truncation positions and a quality filter is somewhat redundant, so you could try setting the truncation positions to 0 (disabling them) and only passing the quality filter.

As a next step try playing with these parameters a little bit to see what the bottleneck is. For example, try lowering the quality filter to say 15 and see if that retains more of your reads.

It also looks like you may have a merging problem, but we can address that once we figure out the filtering issue.

Thanks

Rishikesh12 · May 8, 2023, 10:15am

Thanks for your guidance. I will work on it and come back to you.

Regards
Rishikesh

Rishikesh12 · May 10, 2023, 6:42am

while i am setting q-20 in dada 2 as per you suggestion without giving any value for --p-trunc-len-f and --p-trunc-len-r. I got an error. can you look into this and help me to solve this problem.
command I have used.

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-trunc-q 20
--o-table table-q-20.qza
--o-representative-sequences rep-seqs-q-20.qza
--o-denoising-stats denoising-stats-q-20.qza

An error was encountered while running DADA2 in R (return code -9), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-225yvdyn.log

Here the log file below

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/tmp7p7e10kw/forward --input_directory_reverse /tmp/tmp7p7e10kw/reverse --output_path /tmp/tmp7p7e10kw/output.tsv.biom --output_track /tmp/tmp7p7e10kw/track.tsv --filtered_directory /tmp/tmp7p7e10kw/filt_f --filtered_directory_reverse /tmp/tmp7p7e10kw/filt_r --truncation_length 0 --truncation_length_reverse 0 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 20 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 1 --learn_min_reads 1000000

R version 4.2.2 (2022-10-31)
Loading required package: Rcpp
DADA2: 1.26.0 / Rcpp: 1.0.10 / RcppParallel: 5.1.6
2) Filtering ........................
3) Learning Error Rates
155666098 total bases in 1164673 reads from 6 samples will be used for learning the error rates.
Traceback (most recent call last):
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 326, in denoise_paired
run_commands([cmd])
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/tmp7p7e10kw/forward', '--input_directory_reverse', '/tmp/tmp7p7e10kw/reverse', '--output_path', '/tmp/tmp7p7e10kw/output.tsv.biom', '--output_track', '/tmp/tmp7p7e10kw/track.tsv', '--filtered_directory', '/tmp/tmp7p7e10kw/filt_f', '--filtered_directory_reverse', '/tmp/tmp7p7e10kw/filt_r', '--truncation_length', '0', '--truncation_length_reverse', '0', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '20', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '1', '--learn_min_reads', '1000000']' died with <Signals.SIGKILL: 9>.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in denoise_paired
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/rishikesh_dash/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 339, in denoise_paired
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code -9), please inspect stdout and stderr to learn more.

colinvwood · May 10, 2023, 4:31pm

Hello @Rishikesh12,

This means that your operating system killed the process. It could be for lack of memory, it could be for other reasons. Are you running this on your own computer, on a cluster, or some other way?

Rishikesh12 · May 12, 2023, 10:29am

I am running it in my Laptop.
Specification : Processors - Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz 2.11 GHz
RAM - 8.00 GB (7.81 GB usable)
System type - 64-bit operating system
And I am doing this work in WSL system.

colinvwood · May 12, 2023, 4:20pm

Hello @Rishikesh12,

8G of ram isn't a whole lot, and factoring in the WSL overhead, if your input demux is large, this could well be a memory issue.

To investigate whether it is there are a few things you can do. You can run a resource monitoring tool like htop while this command runs to look at the memory usage. Or you could subsample your demux using qiime demux subsample-paired and then rerun the command to see if it completes with a smaller input.

Rishikesh12 · May 13, 2023, 4:47am

I will just type htop in command line or I will add it in dada2 parameters?
how to qiime demux subsample-paired?

Thank you
Rishi

colinvwood · May 15, 2023, 4:38pm

Hello @Rishikesh12,

Yes htop is a command, you can run it in another terminal while your dada2 command is running.

You can run the qiime demux subsample-paired command with the --help option to learn more about it. Basically, it will randomly subset your demux, giving you a smaller one that may complete successfully, telling you that your error probably had to do with memory.

Rishikesh12 · May 22, 2023, 2:03pm

Hey @colinvwood
Here I am attaching htop pictures while running dada2 and after dada2 failure.
Next, I am planning to try the command qiime demux subsample-paired. I have familiarized myself with the usage of this command. However, my question is, after subsampling, when I run dada2 on the subsampled demultiplex sequences and obtain a feature table for the subsampled samples, how i can give justification about this about this?
I set fraction 0.7

qiime demux \
subsample-paired --i-sequences demux.qza \
--p-fraction 0.7 \
--o-subsampled-sequences demux-subsample-70-pct.qza \
--verbose

please how I will explain this subsampling?

Thanks
Rishi

colinvwood · May 22, 2023, 4:48pm

Hello @Rishikesh12,

It looks like you only have around 4G of memory available in your WSL. You mentioned that you should have 8G available on your computer. I would look into how one goes about allocating more memory to WSL.

To answer your question about merging results after subsampling, this is not a good idea because dada2 needs all sequences in one run to make informed decisions--you need to run dada2 with all of your sequences at once. I was suggesting the subsampling approach only as a troubleshooting measure. But after seeing that you only have 4G of memory available, I think it's safe to assume that that is the problem.

Rishikesh12 · May 22, 2023, 4:55pm

Sir @colinvwood please check my edited question regarding subsampling.

Regards
Rishi

colinvwood · May 22, 2023, 5:14pm

Hello @Rishikesh12,

Subsampling is not desirable for your analysis, for the reasons explained above. But to explain what it does, it just randomly subsamples reads up to a specified proportion from your demux file, giving you a smaller but representative demux sample.

Rishikesh12 · May 23, 2023, 6:50am

Thank you for clarification @colinvwood

Regards
Rishi

system · June 23, 2023, 10:15pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.