Filtering of chimeric sequences after vsearch derep, alignment and OTU clustering?

rmbn · March 13, 2022, 8:14am

Hello! I am using a web tool pipeline for clustering, alignment, and taxonomic assignment due to the computational limitations of my device. However, I noticed that it does not remove the chimera from my reads (I'm using de novo ABG OTU clustering algorithm). Since the pipeline run would give me the entire alignment and clustering output, including the taxonomy.qza artifact, is it ok to filter the chimera sequences later or I must filter chimera before clustering? My computer fails to run the clustering processes because I only work with limited RAM so I was thinking of running them in the web tool pipeline that I use and perform chimera removal after.

Basically this is my process:

Make contigs from paired reads
Dereplicate
Alignment and OTU Clustering
Chimera removal
Filtering table and rep-seqs (rare OTUs, unwanted taxa, etc.)
Alpha rarefaction
mafft-fastree tree construction
other downstream analysis (diversity metrics, etc.)

I perform steps 1 to 3 using online web tool and I receive the frequency table, sequences, and taxonomy artifacts. Then with those artifacts, I'm planning to do to do steps 4 to 8 locally.

I actually tried doing it and my frequency table shows considerable drop in my number of features in frequency table.qzv (which I assume are OTUs). But I am not sure if it's technically correct or logical because most of the posts here run chimera removal before clustering.

Also according to tutorial overview

q2-vsearch implements three different OTU clustering strategies: de novo, closed reference, and open reference. All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method)

Thank you. I hope you are all well.

P.S. the web pipeline I use do not support dada-denoise so I'm stuck with OTUs rather than ASVs.

rmbn · March 13, 2022, 10:16am

Ooops. Nevermind the question. Further digging I found this answer by colinbrislawn Chimera check questions - #2 by colinbrislawn

Hello Steffen,

EDIT: I just found this excellent chimera checking tutorial written by Greg. Definitely start there.

You can run uchime-ref at any time, so it’s probably best to do it late in your pipeline, when you have fewer features to check. Say after step 3.

Uchime-denovo requires size annotations, so you have to run it after step 1 (dereplication adds size annotations). I have seen people do uchime-denovo before or after clustering (or both!). Greg recommends running it after clustering.

The uchime de novo algorithm is slow, so running it after clustering saves some time. (Actually it’s pretty fast but not easily parallelizable, so it’s seems slow!). Running it before clustering may improve accuracy because there are more parent reads that can explain and detect low abundance chimeras.

Let me know if that helps,
Colin