Overall workflow and paired reads

Hello everyone!

I was reading throughout the QIIME 2 manual, and I was wondering if the following train of thought somewhat accurate:

  1. I first imported my raw Illumina MiSeq data and control data from SRA.
  2. I joined my reads (here’s where it can get a bit complicated), but I noticed that the SRA data has been previously filtered and merged in QIIME. Would it hurt to rerun this again? Should I just import the SRA data in later steps?
  3. I applied denoising using Deblur.
  4. I realized that somewhere in the guide it says that Deblur has somewhat a dechimerization process, but regardless of that I ran Vsearch de-novo chimera search, is this redundant given that Deblur removes some of them?
  5. I took the output from the non-chimera artifact, and performed a de-novo clustering. (I’m guessing in this part I should have added the SRA sequences)
  6. I downloaded the rep-set from Silva 16S only with 99 similar sequences, along with the same 99 folder for taxonomy, and I’m running (as I write this) the “Assign taxonomy” function with Vsearch-consensus.

I guess I’m just afraid I’m screwing somehow something by rerunning dechimerization, or re-filtering the filtered data from NCBI. Is multithreading already applied to every process in qiime? Or do we have to set it up in order to maximize all possible cores from our computer?

Thanks for this!

Best,

Daniel

1 Like

Hi @daniel.castanedamogo,

Welcome to the forum! Let's talk about some of your steps.

Did you download single or paired reads? If they're unpaired (forward and reverse), I would join them. If they're forward only (suggesting they were already joined), I'd work from there.

I would run quality filtering first, but this is the right denoiser for this job. :slight_smile:

Yes. Deblur does chimera/host removal against a database. My experience has been positive in human samples, and I very rarely run chimera removal. It's unlikely to hurt, but its potentially redundant and computationally expensive.

Im unclear why you chose to do this. If you have the same primer region and trim length, you can compare the ASVs. You get improved resolution, externally valid results... all sorts of benefits. So, Im curious why you chose to re-cluster your sequences when you could just denoise the SRA sequences.

I would check the help documentation, but I think most multi-threaded functions are setup to use the default parameters for the system.

Best,
Justine

Hi Justine,

Thanks a lot for your answer! It is very insightful :smiley:

Did you download single or paired reads? If they’re unpaired (forward and reverse), I would join them. If they’re forward only (suggesting they were already joined), I’d work from there.

It is kind of weird, because I can download them either unmerged from SRA using the SRA toolkit --split-3 option, or merged already. I noticed that if I download them without merging them, the reads do not have any overlap region... Basically I have 278bp from forward and 278bp from reverse reads that are supposed to make a read of 556bp (again, no overlap, just 'stitched' together). And those have been already filtered, denoised, and without any chimeras... So I'm guessing I should import these files right before my taxonomic analysis.

I would run quality filtering first, but this is the right denoiser for this job. :slight_smile:

Yes, you're right, I should have run a quality filtering first. By this, you mean to pass reads with a quality phred score of, say... above 25? Is that a good standard to go?

Yes. Deblur does chimera/host removal against a database. My experience has been positive in human samples, and I very rarely run chimera removal. It’s unlikely to hurt, but its potentially redundant and computationally expensive.

Gotcha, thanks a lot for this, I see that it could be pointless to run this step again.

Im unclear why you chose to do this. If you have the same primer region and trim length, you can compare the ASVs. You get improved resolution, externally valid results… all sorts of benefits. So, Im curious why you chose to re-cluster your sequences when you could just denoise the SRA sequences.

I was under the impression that clustering was after the denoising... I looked at the overall workflow in the "Grand overview" section, and I think you are right... Because it is written as "Denoising/Clustering", so is it one or the other? There's no point in running both? So...the denoising step by Deblur already clusters my sequences?

I would check the help documentation, but I think most multi-threaded functions are setup to use the default parameters for the system.

Will do! Thanks a lot for your very complete answer (:smiley:

Best,

Daniel

Hi @daniel.castanedamogo,

These won't be comparable with what you get out of :qiime2:. For marker genes, i typically assume that what's in the repositories is unprocessed because that's the policy for most repositories. So, I would recommend processing them the exact same way. I would also verify your primers and the primers for the region you're downloading. Microbiome data is sensitive to technical effects, and so its important to make sure things line up correctly.
You also need to be aware that the length of your reads will dictate how you can combine sequences, so if you can't join, i would trim your reads to match theirs.

You want to use the q2-quality-filtering plugin. i recommend default settings, becuase those have been optimized for deblur. The alternative methods of read joining tutorial goes through one workflow.

Im gonna refer you to a couple of threads on that...

Best,
Justine

2 Likes

Yes! That did the trick! Thanks for your help! :smiley:

1 Like