Identical ASVs with and without adapter from Cutadapt/dada2

Hello everyone,

I have been using Qiime2 to generate ASVs of V3-V4 sequences by trimming off the primer sequences and then denoising with DADA2. I am using qiime2 2020.6

I noticed in my ASV sequence output I often have ASVs that are identical over the entire V3-V4 region however are a different length where one will have a primer sequence still stuck on the end of it that didn’t get trimmed off because of a SNP or small INDEL in sequence corresponding to the primer.

For example:
ASV1:
…GAGGAGCGAAAGCGTGGGGAGCGAACA
ASV2:
…GAGGAGCGAAAGCGTGGGGAGCGAACA(GGATTAGATACCCCAGTAGT)

I tried to use cutadapt to remove read pairs that don’t get trimmed but it didn’t resolve the issue:

qiime cutadapt trim-paired
–i-demultiplexed-sequences $OUTPATH/demux-paired.qza
–p-cores 6
–p-front-f CCTACGGGNBGCASCAG
–p-front-r GACTACNVGGGTATCTAATCC
–p-error-rate 0
–o-trimmed-sequences $OUTPATH/demux-paired-trimmed.qza
–output-dir $OUTPATH/unspecified-trim
–verbose
–p-discard-untrimmed

Has anyone else seen this issue? My assumption is that -p-discard-untrimmed is keeping pairs where only one end was trimmed, but if that is the case then is there a workaround to remove these bad ASVs?

Thank you all for any help!
Nathan

Hi, @ndhicks

Your have set the error rate to 0, which means you only allow the exact primer sequences you used to be trimmed. However, in the real amplicon sequence, primer base substitution is common. So I suggest you used the default parameter –p-error-rate 0.1 to allow some of the substitution.

If the ASVs still contain primers, you should consider change –p-overlap or –p-times to allow more primer to be identified and further trimmed. I had one amplicon dataset that due to the terrible PCR amplification, primers and adaptors showed not once but twice in some of the raw sequences. I had to change the –p-times to 2 instead of the default parameters 1 to trimmed those adaptors.

Sixvable!

3 Likes

Hi @ndhicks,

To add to @sixvable’s comments… I’d also recommend adding the flag:

`--p-match-adapter-wildcard`

This will allow cutadapt to match the IUPAC wildcards in your primer seqeunces, i.e. N, B, S, etc… otherwise these will be counted as mismatches, which impacts the --p-error-rate flag @sixvable mentioned.

-Cheers!

2 Likes

Thank you both for the input! I really appreciate it.

It looks like –p-times 2 and using the default error rate has solved my problem. I didn’t even consider the possibility that an adapter had been trimmed and then there was another one in a tandem repeat with it.

When I take this set and align it against itself I no longer get pairs that are fully matching across the V3V4 with different lengths.

4 Likes

:fireworks: Glad we were able to help! @ndhicks! :fireworks:

2 Likes