Yet another question regarding OTU clustering

mstagliamonte · February 25, 2019, 6:24pm

Dear Qiimers,

Here I am, with another question regarding the vsearch workflow. For what I understand, the workflow should be something like this:

trim adapters (done)
merge paired end reads (done)
quality filtering (done)
chimera filtering
dereplicating
otu clustering

I am a bit confused at this point, as I read in some of the tutorials that I need to trim the reads to equal length, possibly using cutadapt, is that correct? I am struggling to find the proper command to do it.

Looking forward to your suggestions,
Max

thermokarst · February 25, 2019, 6:26pm

Hi @mstagliamonte - have you seen this tutorial? Clustering sequences into OTUs using q2-vsearch — QIIME 2 2019.1.0 documentation Also, this one: Identifying and filtering chimeric feature sequences with q2-vsearch — QIIME 2 2019.1.0 documentation

mstagliamonte · February 25, 2019, 6:54pm

Hi, @thermocast ,

Thank you for your kind attention. I have seen those tutorials, but I still think I am missing something. Actually, I have just reviewed the workflow here:

https://docs.qiime2.org/2019.1/tutorials/overview/

From that it looks like chimera filtering comes after the clustering step. Am I correct? As an alternative, would it be correct to just use the deblur output as input to clustering?

Best,
Max

thermokarst · February 25, 2019, 7:09pm

Yes.

Yes, although atypical --- while not incorrect, people tend to either go with ASV methods (like deblur or DADA2) or clustering --- not both.

Not necessarily --- where did you read that?

mstagliamonte · February 25, 2019, 7:29pm

Great, thank you.

I need to trim the reads to equal length, possibly using cutadapt, is that correct?
Not necessarily — where did you read that?

Here:

https://docs.qiime2.org/2019.1/tutorials/qiime2-for-experienced-microbiome-researchers/#merging-reads

got to OTU clustering --> Length trimming.

While my raw paired end reads are the same length, merged reads are not. Maybe I am just misunderstanding it.

thermokarst · February 25, 2019, 7:36pm

From my perspective this is probably a bit more of a "best practice" rather than a requirement, although perhaps @cduvallet can shed some light on that section of the tutorial.

mstagliamonte · February 25, 2019, 7:53pm

Thank you for your kind explanation. I look forward to @cduvallet 's feedback. In the meantime, I will go ahead with the pipeline.

Best,
Max

mpodar · February 25, 2019, 8:18pm

The amplicons generated by the 515F-806R primers (and for that matter probably any other amplicon) are not exactly the same length across the taxonomic spectrum, there are single nucleotide indels here and there, relative to the majority. That will give a dominant size and some shorter/longer (1-2 nt on either side). If you truncate to the most common size and remove anything shorter you will lose some taxa. If you trim the longer sequences from one end you will have a terminal gap in those sequences following a multiple sequence alignment and an internal gap in the dominant size sequences. Not sure if/how these would impact OTU calling as compared to leaving them as they are. But I would definitely not remove sequences that are a few nucleotides shorter than average. You can always extract them and see if they are a district taxon or a potential artifact.
Best,

Mircea

cduvallet · February 25, 2019, 9:35pm

I agree with everything that's been said!

The length trimming step is more important for dramatically different length reads (like the output of single-end sequencing like the older 454 pyrosequencing). I wouldn't worry about trimming if the reads are just a few bp's different (like @mpodar mentioned).

Having different length reads will affect your OTU calling differently depending on which similarity function you're calling. Check out the vsearch documentation (see the --id and --iddef section). fwiw, it doesn't look like the qiime2 implementation of vsearch allows for the --iddef flag to be changed.

In general I agree with @thermokarst that it probably makes more sense to do either denoising or clustering, unless you have a specific reason for wanting to do both.

mstagliamonte · February 26, 2019, 1:46pm

That is great information from you all,

I am trying to implement both ASV and OTU strategies (separately) and I think I have a better idea of the two workflows now.

Thanks everybody for your kind help
Max

system · March 29, 2019, 7:46pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.