Best way to just Denoise Illumina Data

(Lara Whoi) #1

Hello -
I am helping a colleague with an Illumina dataset that was initially run in qiime1 (1.9.0). She has written up a paper (it was ready for submission) - but we just learned that the sequences were never denoised. They were taken through the qiime pipeline, just without the denoising step.

I was looking for advice on the simplest way to denoise this dataset.

Would you recommend 1) re-running the sequences through the entire qiime2 pipeline or 2) using deblur on its own and then re-integrating into qiime?

It is possible to denoise via deblur within qiime2, but bring the denoised data back into qiime1 (I ask this because I am familiar with the qiime pipeline but not qiime2)?

I will be in charge of the sequence analysis and I have been using dada2 and phyloseq to analyze sequences and am just on day1 of teaching myself qiime2. (I am familiar with qiime)

I appreciate any feedback - many thanks for your time

1 Like
(Colin J Brislawn) #2

Hello Lara,

Thanks for posting! Welcome to Qiime 2 :qiime2:

Here are the two options I would recommend:

  1. re-running the sequences through the entire qiime2 pipeline

Just like you said :+1: … or

Don’t change anything and publish your Qiime 1 results

This second option should be defensible. You will miss out on cool new methods in Qiime 2, but the Qiime 1 results aren’t wrong. I’ve defended this approach before by saying we wanted to use well understood, classic methods so that our results are directly comparable by past studies that used this method. Ideally, you could want to reprocess using new methods, but this second option should be fine.

But wait! Did you say qiime 1.9.0 ? There is a critical bug that was fixed in 1.9.1, so you might want to check out this bug and see if it effects you. In this case, the results from 1.9.0 would be wrong, making reprocessing necessary.

Let me know what you find. And feel free to ask more great questions!

(Lara Whoi) #3

Hello Colin -
Thank you so much for your reply and help!

I greatly appreciate the heads-up about Qiime 1.9.0 - but as I understand it, because this dataset is V4 (515F/806R), 1.9.0 worked fine on this primer set (but not any others).

  1. Am I correct that Qiime 1.9.0 works with (515F/806R)?

Thank you for offering support to publish the Qiime1 results.

  1. But do you think reviewers would accept Qiime 1 MiSeq Illumina results that were NOT denoised? (We were very strict with quality control cuts.)

[As a proxy for denoising, I reran the data through qiime 1 without any singletons and the overall results are very similar. We are processing these results to summarize at the Class and Genus level. ]

If you were reviewing the paper, would you have any issue with denoising via Deblur and then further processing the data within qiime1?

thanks for your time and insights!

1 Like
(Colin J Brislawn) #4

First things first: Yes, you should be fine with Qiime 1.9.0 and 16S v4 reads. Sorry to worry you! :sweat_smile:

Denoising: My understanding is that ‘denoising’ was important back-in-the-day with 454 data. IDK if Illumina ever needed denoising, or if just strict quality control was enough. See this paper on strict QC:

These days, ‘denoising’ has come to mean sub-OTU level feature creation (like ASVs / ESVs / DSVs etc.). Deblur is one modern denoising method.

… I don’t know, let’s find out! Would you be willing to post your methods section so I can review it for real? (You can also send me a message directly. :+1: )


(Lara Whoi) #5

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

1 Like
(Justine) #6

Hi @lara-whoi,

This wa posted publicaly. You may want to withdraw the post and PM Colin!

1 Like
(Colin J Brislawn) #7

Hello again!

Like Justine mentioned, this post is public (but I’m cool with that if you’re cool with that #openscience).

Your method section looks good! Ironically, all my suggestions have to do with wording, not with the underlying science! :microbe:

Sequences were binned clustered into operational taxonomic units (OTUs) within 97% similiarity using UCLUST (Edgar 2010).

After trimming subsampling each sample to an equal number of tags reads,

weighted UniFrac values distances

Maybe other reviewers would care more about using the newest methods, but I think this section of the paper looks solid. :+1:


1 Like
(Lara Whoi) #8

thank you Colin -
I greatly appreciate the feedback.


1 Like