Error while denoising using dada2

Tania_Aires · November 8, 2021, 1:21pm

Hey!
I'm new to Qiime2 so I am sorry if this is a silly and easy to fix problem.
I was denoising my data in Qiime 2 (paired-end demultiplexed data) and I've run into an error (after 4 days of running)
The command used was:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-f 150
--p-trunc-len-r 150
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

The error was:
Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1) please inspect stdout and stderr to learn more.

Debug info has been saved to /var/folders/d7/0rpznrmn0rvfhz7qb56q0f2hoooogn/T/qiime2-q2cli-err-m6vu59vi.log

I have no idea how to get that debug info...
I am using qiime2-2021.4

I would really appreciate if someone could help me with that.
Thanks a lot!
Tania

Mehrbod_Estaki · November 8, 2021, 9:08pm

Hi @Tania_Aires ,
Welcome to the forum!

You can either re-run your commands and add the --verbose parameter to it which will print out the full error, or, you can always fetch the error log from /var/folders/d7/0rpznrmn0rvfhz7qb56q0f2hoooogn/T/qiime2-q2cli-err-m6vu59vi.log. You can print it using cat or something similar. Note however since this is a temporary file it may not still be there right now.

Tania_Aires · November 9, 2021, 11:09am

Hi @Mehrbod_Estaki
Thank you so much for your feedback.
You're right, I can no longer find the temporary file. I'll re-run the command using the --verbose parameter and I'll post the results. It might take a few days.
Thanks!
Tania

Tania_Aires · November 9, 2021, 4:40pm

Hi again @Mehrbod_Estaki

While this command are re-running with the --verbose parameter I went back and looked at my demux.qzv file and there's probably something wrong with it. I zoomed in and the interactive quality plot doesn't look normal (please see attached)...there are no box plots.
Any idea what might have happened? That might be the reason for the error message in the denoising step?
Again, thank you so much for your help
Tania Airesdemux.qzv (316.4 KB)

Mehrbod_Estaki · November 9, 2021, 9:50pm

Hi @Tania_Aires ,

You can always run dada2 on a subset of your data instead for troubleshooting purposes, or at the very least you can increase the # of threads you use in dada2 with the --p-n-threads which will significantly increase your processing time.

Your quality plots don't look like typical MiSeq/HiSeq Illumina data which is what dada2 denoise paired is designed to work with. Can you confirm what sequencing technology was used here? Also, is it possible some pre-processing was done on your fastq files before being imported into Q2? DADA2 works best when it trained an error model on raw reads without any other quality control processings.

Since we have no idea what the message actually is saying, we should wait until we get the full error message first

Tania_Aires · November 10, 2021, 2:39pm

Hi @Mehrbod_Estaki
Again, thank you so much for your input.
Yes, I put it to run using all the possible threads, hopefully it will be done by tomorrow.
I am pretty sure my data is MiSeq but, in the meantime, I got in touch with the company to check if they had an idea on what could be happening and they say that I should be using, and I'll quote: "BIG DATA qiime2 methods, as such “dada2 Big Data workflow”. I have no idea what that is, though

Also, I looked closer into my demux folder (the one that I import into qiime2) and I only had R1 sequences (except for a single sample). Then I went back to the raw data processing using the Fastq processor, which I use to convert my data into qiime2 usable files, and I guess it might be doing something wrong: I'll try to explain below.
The company sends raw data already demultiplexed. Looking like this:

Each one of these folders contain a zipped R1 file and a zipped R2 file:

After using the Fastq processor I noticed that for that sampled that came out OK with both R1 and R2 files in the dexux folder, I got this (in the original folder shown above):

While for the other samples that ended up with just R1 in the demux folder, I got this:

Apparently, Fastq processor is not unzipping both of the files for most of the samples...no idea why that happened.

So, I guess that should be the problem...but anyway I need to wait until I have the entire error message, just wanted to give you some more details on my data
Thanks a lot

Mehrbod_Estaki · November 11, 2021, 3:06am

Hi @Tania_Aires ,

Thanks for the update. If it is a big dataset (which sounds like it is), it very well may be worth your time to use the native R version of DADA2, following their recommended big data protocol. But also, that doesn't mean what you are running here is invalid, so if the data is complete that is totally fine as well!

Is this the FASTQ processor from Mr. DNA? If so, I just wanted to give a heads up that I have seen other folks with problems with Mr. DNA on this forum, especially that processor, for example a similar issue to yours here, and here, and several others if you just search Mr DNA on the forum. I personally have never used Mr. DNA but the easiest solution I can offer to avoid some headaches is to just use the raw FASTQ files without any processing. Unless there is something unique about the FASTQ processor that is required here.
FASTQ files without primers and barcodes etc. would be ideal but I understand they may charge extra for that. Removing those however is possible in QIIME 2 using various plugins too.

Tania_Aires · November 11, 2021, 9:52am

Hi @Mehrbod_Estaki
Thanks!
Exactly, that is the MrDNA app that is used to remove the primers. I talked to them reporting the problem and they got upset with me because I "assumed that their software was doing something wrong" and they never got any complaint before, and that was all because I'm not experienced in bioinformatics...now I feel less alone
Anyway, the problem I detected in Fastq processor was that it was not unzipping both R1 and R2 before primer removal, so I was just getting R1. I solved it by unzipping it manually but it would be great to know how I could use the raw FASTQ files without any processing (they provide the data without barcodes but they do not remove the primers...they would charge more for that...).
If you, or someone, could guide me trough it, it would be great.

(qiime2 is still running, so no update on that error).

Thank you so much
Tania

Mehrbod_Estaki · November 11, 2021, 7:34pm

Hi @Tania_Aires ,
What a friendly response...sorry to hear you didn't get far there with them! Let' see if we can help get you through this bump.

So, you have demultiplexed fastq files that have primers still intact in the reads.

First, you can import these files into QIIME 2 using a manifest.
Visualize your demux reads as you have before for sanity check.
Remove the primers using q2-cutadapt. You'll need the exact sequence of primers used for this step.
Once you have successfully removed these, you should then be ready to go back to your dada2 steps and everything from that point on is standard workflows you can find in the various QIIME 2 tutorials.

I will say here though that I am not familiar with Mr. DNA's sequencing protocols, and my recommendation above is based on what typical sequencing facilities do. If Mr DNA has some other unique protocols, there may be some additional troubleshooting we may have to do along the way.

Tania_Aires · November 12, 2021, 1:39pm

Hi @Mehrbod_Estaki

Again, I'm really grateful for taking your time to help me with this! Really appreciate it.
I've read through the manifest and didn't quite understand why we need to create that manifest file and where we are using it. The raw data MrDNA provides (paired-end, demultiplexed) comes like this:

Two zipped files (R1 and R2) for each sample, which look like this:

Isn't that possible to jump right away to the importing step using this command?:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path pe-64-manifest
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred64V2

Thank you so much for you help!
Tania

Mehrbod_Estaki · November 12, 2021, 6:32pm

Glad to be of help @Tania_Aires !

In the command you listed the pe-64-manifest would reflect the manifest file which holds information about the location of your FASTQ files on your system.

If you're sure that all your fastq file-names follow the expected CASAVA 1.8 naming convention you could simply import using this approach as well, but the manifest approach is the safer option because there is no requirements for file-names. I only suggested the latter because I wasn't sure if all of your file-names met the requirements for the CASAVA import. Feel free to use whichever approach you prefer

Tania_Aires · November 15, 2021, 9:48am

Hi @Mehrbod_Estaki
Than you so much! I got it now
I'm nor sure about the CASAVA format so, I'll try both ways (with and without the manifest) in a subset of the samples and will let you know if it worked well.
Just another question, in the catadapt step, once my primers are in the beginning of the sequences (already checked), should I use the ^ character like this:

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-adapter-f '^AMCVGGATTAGATACCCBG
--p-adapter-r '^ACGTCATCCCCACCTTCC
--o-trimmed-sequences demux-pairedend-trimmed.qza
--verbose

Thank you so much
Tania

Tania_Aires · November 16, 2021, 3:30pm

Hi again, @Mehrbod_Estaki
So, about the original problem that took me her, I've re-ran the DADA2 denoising with --verbose to see which was the error...and surprisingly there was no error this time...I didn't change anything, just added --verbose and increase the number of threads so it could run faster. I have no idea what might have happened before.

About the other issue, where I'm trying to get less dependent on MrDNA Fastq Processor to remove my primers, I've tried both ways (importing my data right away because MrDNA said it follows CASAVA naming convention...but we never know) and importing it using a manifest on a subset of my data set. I was able go through both ways and get my demux.qza file

Then I used cutadapt to trim my primers.

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-adapter-f 'AMCVGGATTAGATACCCBG'
--p-adapter-r 'ACGTCATCCCCACCTTCC'
--o-trimmed-sequences demux_trimmed.qza
--verbose

and then I checked the demux_trimmed.qzv file and there's something wrong. Apparently the quality dropped a lot...please find attached the not trimmed file demux file and the trimmed one. Can you please help me here? Sorry for the bugging...still getting used to this one after years and years on qiime1
Thanks a lot
demux_trimmed.qzv (315.6 KB) demux.qzv (310.9 KB)

Mehrbod_Estaki · November 18, 2021, 8:23am

Hi @Tania_Aires,

Thanks for the updates.
So the first thing I noticed is that the second demux_trimmed.qzv file has lost almost all of its reads!
You went from having 11075511 total reads in demux.qzv to having just 2610 in your demux_trimmed.qzv. So...let's not even worry about the quality plots right now cause something very bad has happened in that step

I don't know if the quotes you have flanking your primers in your cutadapt command are doing something funny, can you try not including those and re-running? Can you also share with us the results of that command (it should be printed out since you have --verbose on).

To be honest, aside from that I'm really not sure what else to do here. Mr. DNA is always a mystery to me. For example, did you get confirmation that this is MiSeq data? I'm still not convinced it is, or at least the data you have is truly the raw format we typically see from a MiSeq machine. Anyone who's looked at this kind of data long enough will tell you those quality plots look very odd for raw MiSeq data...here is the demux plot from the Moving Pictures tutorial which is what we expect. See how you can actually see variation in the quality and a typical dip on the 3' tail. Your data on the other hand is looking super binned which is more aligned with the newer Illumina machines such as the TrueSeq or NovaSeq.
That being said, this isn't related to the issue I just mentioned about you losing all your data after cutadapt! Separate issues that both need resolving.

Tania_Aires · November 18, 2021, 11:24am

Hi @Mehrbod_Estaki

Thanks!
Yes...I noticed that too I basically ended up without reads...
I tried again without the quotes but the result was the same, attached you can find the results of the cutadapt command (it's huge so I copied and pasted it in a separate file)...
I have no idea what might have happened.

I didn't ask them if the data was MiSeq or not because I was (not anymore) certain about that (that's what I asked and paid for), in the invoice the description of the process was: Illumina MiSeq platform using the bTEFAPR process. Now, I'm no longer certain of anything...I will ask them. You are right, those don't look like the typical MiSeq reads and I, who's just getting into qiime2, noticed that right away and asked you about it in the first post.
Thank you so much for your help.
I'll get back with MrDNA's answer

cutadapt_summary.txt (94.7 KB)

Tania_Aires · November 19, 2021, 2:52pm

Hi again @Mehrbod_Estaki
So, MrDNA got back to me and it turns out...they did NovaSeq on my last batch and didn't tell me anything about it...so, I guess I'm done with them...I am so sorry I was wasting your time...probably all the problems were because of that.
Anyway, I was able to get through the all process and didn't get an error in dada2 denoising (using fastq processor from MrDNA but unzipping the files manually before to compensate the bug). Shouldn't I consider the resulting table? Is there a different tutorial to work with NovaSeq in qiime2? I don't know how to work in R so the ideal would be to have something that it could be done in qiime2...
Thanks a lot! and sorry for that...
Tania

thermokarst · November 22, 2021, 3:09pm

I'm really sorry about your experience @Tania_Aires, that's really unfortunate.

No need to apologize - I think its helpful to document your experiences with this company here, that way other people can make informed decisions when considering a sequencing provider. It can be enticing to pay such a cheap sequencing rate, but then you rack up the costs on the bioinformatics/bookkeeping side, which is tough.

No, in theory you can you run this in dada2 directly (in R, see this thread Consequences of using dada2 on NovaSeq data · Issue #791 · benjjneb/dada2 · GitHub), then import the resulting table and rep-seqs into QIIME 2. The good news is you already have R and dada2 in it installed in your QIIME 2 conda environment.

Tania_Aires · November 23, 2021, 10:08am

Hi @thermokarst
Thank you so much!
Just a last question, I've read through the forum and saw that it is possible to use Deblur (instead of DADA2) with NovaSeq, am I right? I'm working with paired-end data so I'll need to merge the reads first but then it is OK to use Deblur?
Thank you all for your help
Tania

thermokarst · November 23, 2021, 2:52pm

Hi @Tania_Aires , yeah I think that deblur is an option, although you might want to dig into that question a bit more, I'm not sure if there are any particular assumptions about deblur that Novaseq would or wouldn't conform to. Otherwise yeah, you can check out the steps for performing a deblur-based analysis here:

https://docs.qiime2.org/2021.8/tutorials/moving-pictures/#option-2-deblur

Tania_Aires · November 24, 2021, 12:46pm

Hi @thermokarst
Thank you so much! I'll try the different options and compare the outcomes!
Thank you all
Tania