How to remove primers and contaminants when using two pairs in multiplex?

vln21 · June 9, 2021, 10:58am

Hello!

I am new to qiime2 and metabarcoding analysis. I have a paired-end 16S dataset of diet samples generated with a combination of 2 primer pairs in multiplex to target two distinct groups of prey. Expected amplicon size should range between 260 and 310 bp. Fastqc files are from Illumina PE 250 and I received them already demultiplexed from the sequencing facility.

After running DADA2 denoise-paired, I realised primers were still attached to reads and there was a considerable amount of unexpectedly long sequences (>400 bp) retained on representativeseqs list which are actually contaminants (blast is poor and points to bacteria). These sequences have no primer match in it at the beginning. Instead, they start with a long string of CCs, have a short sequence in the middle with poor blast and end with another long string of GGs.

Seven-number summary of sequence lengths indicates that sequences >400 correspond to 75% percentile. I have two blanks included which might inflate the amount of such contaminants in the whole dataset.

I repeated denoise by trimming the primers length at the 5’ but this does not discard >400bp contaminants.The percentage of input non-chimeric after denoise is lower or much lower than 75% for most samples.

I think it would be better to discard these contaminants prior to denoise. I thought of using cutadapt trim-paired to remove primers and contaminant reads without primer in it, but I don’t know how to do it with more than one set of primers:

Chord_16S_F

GATCGAGAAGACCCTRTGGAGCT

Ceph_ 16S_F +

GACGAGAAGACCCTAWTGAGCT

Ceph_16S_R

AAATTACGCTGTTATCCCT

Chord_16S_R

GGATTGCGCTGTTATCCCT

I thought I could use wildcards, but primers seem quite different to me to do that. It would require a lot of ambiguities.

I thought of removing primers sequentially, but then I cannot use the option --p-discard-untrimmed to get rid of contaminant reads.

What would be the best solution to filter out contaminants before DADA 2 denoise?

SoilRotifer · June 9, 2021, 3:14pm

Hi @vln21, welcome to !

You can simply enter multiple primers as summarized here:

More specifically, you can try the following:

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences paired-end-demux.qza \
  --p-cores 4 \
  --p-front-f GATCGAGAAGACCCTRTGGAGCT  GACGAGAAGACCCTAWTGAGCT \
  --p-front-r AAATTACGCTGTTATCCCT  GGATTGCGCTGTTATCCCT \
  --p-match-read-wildcards \
  --p-match-adapter-wildcards \
  --p-discard-untrimmed \
  --o-trimmed-sequences paired-end-demux-primer-trimmed.qza \
  --verbose > cutadapt-log-2.txt

Then you can denoise.

system · July 10, 2021, 9:15pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.