Plugin error from dada2:An error was encountered while running DADA2 in R (return code 1),

Dear qiime2 staff.

I got an error when using dada2 to denoise a double-ended sequencing sequence: Plugin error from dada2: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

I see multiple problems similar to mine in the forum, but there is no helpful solution at the moment.
I'll start by describing the data I used, I identified a portion of the data of interest from the literature and hopefully I can re-analyze this data myself, they used 338F and 806R for amplification of the V3-V4 region, Illumina MiSeq PE300 sequencing.

Here is the base mass distribution of this pair-end sequencing set:


Next I used dada for noise reduction of the sequences (the splice sequences, primers have been removed using trimmomatic before data import).
time qiime dada2 denoise-paired
--i-demultiplexed-seqs pair.qza
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
--p-trunc-q 25
--p-trunc-len-f 240
--p-trunc-len-r 160;
I got an error report and I tried to change the truncation location, most of them are repeating this error:
Error Alert

I monitored the running process and found that there was an error in the calculation of removing the chimeras and the sequence table was not generated, the following is the test log:

.
I continued to try, and I saw that there was a claim that the two bit truncation lengths needed to add up to a certain threshold, and I adjusted the parameter to --p-trunc-len-f 260
--p-trunc-len-r 220; found that it ran successfully, but the dada2_stats result showed that it could not be spliced after successful filtering, so obviously this was a failed attempt. I would like to know if there is an error in my whole processing process that I am not aware of, or if this data set is not suitable for processing with DADA, and look forward to your reply.
Unfortunately the pair-end.qza file exceeds the size limit, I can provide it in other ways if needed.

I think you are getting reads that are trimmed much to short to be able to join due to your specification of --p-trunc-q 25, while you may want to trim your data manually to the point at which this occurs to help prevent loss of sequence data due to the denoising algorithm itself, I think a lot of your reads are getting cut short. Try running it again without this parameter, if it succeeds but you are getting a lot of quality related filtering, continue playing with this number, starting from the default value(2). If this doesn't help, you should be able to message me your paired-end.qza directly and I can take a look at it :slightly_smiling_face:

Thank you very much for helping with this problem, I have tried to run with --p-trunc-q 25 removed, but the result does not change. I uploaded the data to GitHub: GitHub - stevens-1/test_seq: DADA2 test failure sequence, and I hope you can access it without any problems. Looking forward to your further solutions to this issue.

1 Like

Hi @stevens,
Thanks for all the info
Your demux.qzv looks okay. The forward reads look good but the reverse seems acceptable too. Your truncation params look reasonable. However you can see from your dada-stats.qzv that almost all of your sequences are not making it past the filtering step.
Screenshot 2023-01-18 at 11.59.49 AM

I am not sure the exact issue yet but I would be interested to know if you still get this error if you are running single end reads. Given that your forward reads look good, I would be very curious if the majority of the forward reads pass the filtering step. Could you run this data using dada2-single and let me know if you 1) received the same error and 2) how your dada2-stats.qzv looks.
Thanks
:turtle:

Thank you for your attention to this issue and the advice provided.I tried your proposal and got dada2_stats.qzv. The forward reads that could pass the filter were close to 60% (the other set of data I used were close to 90%, so I think the pass rate is low), but I'm not sure yet what caused this result and more sequencing data would be lost if the analysis was done using only single-ended data.

Hi @stevens,
I understand that using single end reads is not ideal. I am a little surprised that only 60% of your reads are passing the filter.

Looking at your data, your primers are 338F to 806R that means your region is 468bp long. If you truncate to 260 and 220, you have 480bps. This is the absolutely minimum you can have while still having the necessary 12bp overlap. Unfortunately, it looks like you do not have the quality to trunc at that length(that's why they are getting filtered out at the filtering step).

I think there are 2 possible next steps:

  1. You could increase the Max-ee parameter. This will allow for more errors in your sequences and may stop them from getting filtered out. I would just warn that you are allowing for more errors in your sequences.
  2. I am not sure what your truncation length was for dada2-single but you could try to reducing that. This may allow you to get more of your sequences through dada2. I know its not ideal to run single end only but since the limited quality scores are making it difficult to merge, sometimes single end is the only way forward.

Thank you for your suggestion. I processed this batch of sequencing data using the maximum sequence length. The sequences after successful filtering were obtained, but the results were very bad, with a pass rate of up to 50% and mostly around 10%.
code:
time qiime dada2 denoise-paired
--i-demultiplexed-seqs pair-end.qza
--o-table table-dada.qza
--o-representative-sequences rep-seqs-dada.qza
--o-denoising-stats denoising-stats.qza
--p-trunc-len-f 280
--p-trunc-len-r 280;

dada2_stats.qzv (1.2 MB)

Then according to your suggestion, I choose to merge to solve the problem,The code is as follows:

qiime vsearch merge-pairs
--i-demultiplexed-seqs pair-end.qza
--p-minovlen 10
--p-maxee 6
--o-merged-sequences pair-end-joined.qza

I didn't know how to determine the maxee parameter, which was randomly selected, and got the following combined sequence quality distribution:

I don't know what's going on in the filtering process at this point, and I don't have the statistics for the merge process. However, I think this method is more effective than dada filtering for this set of data, and I look forward to your further advice on this issue.Thank you for your selfless advice, it helps me a lot.

Hi @stevens,
Yes due to the quality score of your sequences, it makes sense that if you increase the truncation length you will see a decrease in sequences passing the filter. This is because dada2 will throw out sequences that have a "bad" quality score and if you are not trimming out the low quality section at the end you will end up throwing out more sequences.

As for the qiime vsearch merge-pairs solution, this should work. It is kind of an "old school" method for quality filtering. Dada2 is commonly used because it gets you good quality rep-seqs. However, This should work. Here is the pipline I think you should run.
First,

qiime quality-filter q-score

This will do some quality filtering to make sure that the sequences are of okay quality. Second, run

qiime vsearch merge-pairs 

I would try vsearch with the default parameters first and see if that is effective (so don't mess with Max-ee yet).

Lastly, run either cluster-features-open-reference or deblur denoise-16S to get a feature table and rep-seqs you need for your analysis. Deblur will get produce ASVs and cluster-features-open-reference will produce OTUs. OTUs might be better in this case since it is low quality but its really up to you!

qiime vsearch cluster-features-open-reference 

or

qiime deblur denoise-16S

Hopefully this helps!

1 Like

Thanks to your suggestion, I performed the subsequent processing using Deblur and successfully obtained the denoised table and representative sequences. The statistical file:
deblur-stats.qzv (212.7 KB)
, and I believe that the Deblur processing can be continued for species comparison and other subsequent analyses.

Thank you very much for your continued attention and useful suggestions. Although I am still not sure why this data set is not suitable for analysis using DADA, I am fortunate to have found a way to process it with your help and can continue to mine these data. Most sincerely!

2 Likes

Hi @stevens,
I am still a little concerned about the amount of reads that are getting through deblur. If you look at the reads-deblur column, it looks a little low.

Dada2 is struggling to quality filter your sequences because they become very low quality at the end so many of them are getting throw out. However, if you truncate before the "low quality" you do not have enough of an overlap to merge your forward and reverse reads.

Here is the last dada2 command you should try:

dada2 denoise-paired
--i-demultiplexed-seqs pair.qza
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
--p-trunc-len-f 240
--p-max-ee-f 3
--p-trunc-len-r 2020
--p-max-ee-r 3

And see if allowing for more errors allows for more of your sequences to get past quality control!
Hope that helps!
:turtle:

I have tried, but I don't feel that the results are any better, and I can see that the number of sequences that pass the filter is still very small.


code:
time qiime dada2 denoise-paired \

--i-demultiplexed-seqs pair-end.qza
--o-table table-dada.qza
--o-representative-sequences rep-seqs-dada.qza
--o-denoising-stats denoising-stats.qza
--p-trunc-len-f 240
--p-max-ee-f 3
--p-trunc-len-r 202 #I think the 2020 you provided is an unreasonable value
--p-max-ee-r 3

I am not sure where the problem with this set of data is coming from, and I am not sure if I should stick with qiime2 for this batch of data anymore. The original authors describe the processing of this data as follows:

So I have doubts whether it is possible to use the sequences after merge and directly use a Bayesian classifier or cluster-features-open-reference for species classification. Thus bypassing the denoising step of DADA or deblur(Because there is no way to determine why this data set is so poorly denoised).

Yes, Sorry I meant 220.

I think this is a good Idea for this data, I would try using:

Hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.