Good classification, poor clustering

jwdebelius · March 13, 2023, 2:08pm

Hi Team!

I've got a weird conundrum, and I'm hoping for some hive mind wisdom.

I've got a data set that is 16S V34 Illumina paired end data and I'm running in qiime2-2022.2. Because of several project related constraints, I'm sort of stuck with this version.

My reads are part of a meta analysis, and so I've been clustering them closed reference. My current pipeline for hte data is:

Trim primers using cutadapt; keep untrimmed reads since they were trimmed before processing
Join paired ends using q2-vsearch
Quality filter iwth q2-quality-filter using default parameters
Denosing using deblur-16S, trimming like the first 15nt and a reasonable length for the ASVs
Apply a full length Silva 138.1 feature classifier to the ASVs and check the taxonomy using classify-sklearn
Cluster the data closed reference at 99% against the same Silva 138.1 reference sequences I used to build the classifier using q2-vserach.

When I look at the high level ASV taxonomy, it looks reasonably good. The community composition reflects the expected enviroment, there's reasonable variation, and it passes the sniff test.

None of the representative sequences are clustering against the reference database, and the ones I do get to cluster don't make sense. (Mostly Bacilli for a fecal community.)

I've tried:

Switching the primer trimming (no dice)
Running single and paired ends
Changingt the denoising trim length
Relaxing my clustering identity
Allowing mixed orientation reads
Crying

Thus far, nothing has worked.

I'm hoping someone here might have some brilliant insight?

Thanks,
Justine

Nicholas_Bokulich · March 13, 2023, 3:26pm

Hi @jwdebelius ,
I think the issue might just be this step:

99% is quite high, and could lead to many failures, esp. if the reads are just a little bit noisy. I recommend reducing this to see if you start getting an acceptable number of reads passing. Looks like you already did this:

but how much did you relax, and what was the effect?

jwdebelius · March 13, 2023, 7:38pm

Hi @Nicholas_Bokulich,

Thanks for your brilliant insight! I think I went to like 98%, which clearly wasn't enough. When I reduced to 97%, I rescue a large portion of samples and the samples look like stool. I was hoping to stay higher, but it's at least justifiable.

Best,
Justine

system · April 14, 2023, 1:39am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.