How to join read pairs using artificial linkers of ambiguous nucleotides without losing them at the filtering step?

I have a bunch of 18S data (2x300 bp) from stool samples and the quality is pretty good for the samples - Over 95% reads pass the filter stage of denoising by dada2 for all samples and I am running qiime2-2021.11 in a singularity container. Maximum 10% reads are lost at the chimeric step which I don’t think is a problem. However, over half the samples are losing more than 50-90% reads at the merging step so I am losing a lot of data at this step - this is after primer removal btw, leaving 285 nt long forward reads and 284 nt long reverse reads.

I have seen papers using OTUs where read pairs are joined using artificial linkers and I supppose the idea is that at the taxonomy assignment step, the ambiguity is resolved by the assignment algorithm by looking at the nonambiguous ends. I have tried using artificially linked read pairs after trimming and quality filtering in dada2 as single read data but because of the ambiguous nucleotides, all reads from all samples all get filtered out without fail. Is there way to turn off this filtering or would this generally be a bad idea to use with ASVs? The other way I can think of is perhaps using GGGs/AAAs/TTTs/CCCs instead of NNNs in the linker to avoid the filtering step and then changing them before inputting the representative sequences into assignment algorithms but I am not confident this would be the best way to do things.

Hello @hisanY,

Welcome to the forums! :qiime2:

This is a great question and I would love to hear how other folks handle reads discontiguous amplicons.

Vsearch supports --fastq_join in which "sequences are not merged as with the fastq_mergepairs command, but simply joined with a gap" but we have not added that into Qiime2 as most other software does not support discontiguous amplicons...

... as you have already discovered. :grimacing:

One option that works today is to process R1 and R2 fully separately, as if you sequenced two hypervariable regions. Then you can compare and contrast taxonomy assignment. This is not elegant...

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.