Puzzles on merging feature-table/seq

chunri2012 · April 15, 2022, 2:55am

Dear colleagues,
Thank you so much for the fantastic QIIME2 pipline, it brings me to amplicon world.
My name is Yilang Wang, a current PD. fellow of CAS, IUE, China.
I'm now working on some environmental bacterial amplicon data amplified by different primer sets (V3-V4, V4, V4-V5).
Previously, I have processed these data respectively under pipeline like: "primer removing" >> "PE data joining" >> "quality-filter q-score" >> "denoising via deblur"; After denoising, I merged the rep-seqs and feature-tables, finally I got about 55000 ASVs and got about 5000 unique taxonomic assignments via silva database.
At this step, I realized that there must be some ASVs that may biologically amplified by DNA templates of the same specie/organism though amplified by different primer sets. The feature-table merging was not able to really merge ASVs from the same specie/organism, and may even bring spurious alpha diversity and fake beta diversity on ASVs level, regardless of effects of different primer-sets, sampling methods, library construction, sequencing method/depth & etc.
Therefore, I am puzzled to know if I can go to further analysis with the merge feature-table on ASVs level. Or I just can go further with table on a taxonomic level (a merged feature-table collapsed on species level under the same silva database assignment).

Now, I also puzzled if two assumed feature-tables, which are amplified with the same primer-set, such as V4(515F/806F), but denoising with different deblur parameters, such as one table with --p-trim-length 130 and --p-left-trim-len 0 and the other one with --p-trim-length 120 and --p-left-trim-len 10, will give me the really merged ASVs from the same species/organisms.
Qiime2 website (Fecal microbiota transplant (FMT) study: an exercise — QIIME 2 2022.2.0 documentation' ) declares that "denoise-single are directly comparable (in this case, the feature id is the md5 hash of the sequence defining the feature)."

I did some web searches, benjjneb suggested to merge the feature-table after normal DADA2 denoising pipline following by ASVs sequences trimming by primer-set of shared amplified region (Comparing data from two Illumina chemistries (16S amplicon sequencing) · Issue #509 · benjjneb/dada2 · GitHub). joey711 raised a warning (merge_phyloseq of two different phyloseq objects (non matching OTU labels) · Issue #508 · joey711/phyloseq · GitHub).

Now, I am switching to use cutadapt to cut the shared amplified region (V4 by 515F/806R for V3-V4, V4, V4-V5), before deblur denoising. However, I run into another problem that about a half of q-score artifacts seemed to have a 0-246/253/254nt trim length, the other seemed to have longer (>254nt) trim length. And I don't know if the cutadapt step gives the right shared amplified V4 rigeon. And I am also concerned if further denoising and merging will give the truely merged ASVs.
0-246/253/254 part
fdp.02s.qza.qzv (301.4 KB)
fdp.05pjs.qza.qzv (309.1 KB)
fdp.31pjs.qza.qzv (310.3 KB)
longer part
fdp.13pjs.qza.qzv (308.6 KB)
fdp.16pjs.qza.qzv (307.6 KB)
fdp.23s.qza.qzv (302.3 KB)
(I choose deblur as it does not require pool sequencing data, while DADA2 does)

I hope to receive some advice or comment from you.
Thank you

colinbrislawn · April 20, 2022, 8:22pm

Hello @chunri2012,

Welcome to the forums!

Thank you for your detailed description of your pipeline, and the problem you are experiencing when merging ASV tables.

As you have discovered, amplicons from different regions make different ASVs which are not able to merge.

This is one of the limits of amplicons and ASV. This best-of-the-forums post discusses the challenges of working with different regions. It's a very hard problem with no easy solution.

Even changing the length of amplicon from the same region will create different ASVs!

Just like you said, reducing the length of your reads will lead to shorter ASVs, which will make their IDs now longer match and make merging feature tables impossible.

This means that a specific sequence will always have the same ID:

>c9ff71d91da6c66889497ebbdc743570
AGCGTTATCCGGATTT

but if you trim the sequence shorter, then the ID will change:

>f014496db04789259900e0bef778e0c4
AGCGTTATCCGGAT

While this is an unsolved problem, there are some ways forward. I like your suggestion here:

While the ASV IDs will be different, hopefully the taxonomic classification will be similar and allow for features to be merged! You should totally try this!

If you included any positive controls, you could also choose the region (V4, V3-V4, V4, V4-V5) that does the best job capturing the expected composition of these samples. Did you include any positive controls?

Let us know what you try next!

system · May 22, 2022, 4:11am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.