What are the criteria for choosing the optimal "--p-min-fold-parent-over-abundance" parameter in dada2?

Hello everyone,

I am currently trying to analyze data from 16S amplicon sequencing (Nextera XT chemistry for Illumina), and during sequence processing with DADA2, I am getting only a small percentage (~ 50%) of nonchimeric sequences.

I read in previous posts that this can be resolved by tuning the "--p-min-fold-parent-over-abundance" parameter. My question is, how to decide what is the right value that fits my dataset?

Attached you can find the qzv file with the quality of my reads, and the descriptive statistics of the results.

Thank you in advance!

paired-end-demux.qzv (319.5 KB) stats.tsv (609 Bytes)

1 Like

Hello Nikolaos,

Welcome to the forums! :wave:

That seems pretty similar to the DADA2 results from the Moving Picture tutorial (link) and just a little higher than the PD-Mouse tutorial (link) and those were good enough to include as official tutorials, so maybe it's fine? :man_shrugging:

At the risk of stating the obvious, chimers are an artefact of PCR, so if you are doing many PCR cycles to amplify a small genetic signal, we would expect a high-ish level of chimeras and it's great that DADA2 is taking them out!

If you included mock communities with a known composition as positive controls, you can choose a setting that removes chimeric ASVs from these samples while preserving the microbes you know are part of the community. VSEARCH can help you view alignments of possible chimeras in your positive controls:

vsearch --uchime_ref asvs.fasta --db ref.fasta \
  --uchimealns uchime.aln --uchimeout uchime.out

Did you include positive controls on this run?