Recently I am analyzing MiSeq 2*300 reads sequencing data. I used the Illumina Nextera XT index method for the library preparation.
When I use DADA2 to trim the data, I found a command “–p-min-fold-parent-over-abundance FLOAT”, and DADA2 explained the value should be greater than or equal to 1. I am quite new to sequencing analysis, so I really want to know what’s the meaning of this command (I read the description but did not get it…)?
The reason why I ask this question is because: I realized that when I raised the value from default 1 to for example 8, there are more sequences being yield. (Previously when I used 1, DADA2 filtered out ~80%-90% of my merged sequences as chimeras).
--p-min-fold-parent-over-abundance FLOAT
The minimum abundance of potential parents
of a sequence being tested as chimeric,
expressed as a fold-change versus the
abundance of the sequence being tested.
Values should be greater than or equal to 1
(i.e. parents should be more abundant than
the sequence being tested). This parameter
has no effect if chimera_method is "none".
[default: 1.0]
From the dada2 website:
Chimeric sequences are identified if they can be exactly reconstructed by combining a left-segment and a right-segment from two more abundant “parent” sequences.
So this parameter sets the minimum fold-abundance difference there must be between a potential chimera and its parents to be flagged as chimera.
So let's imagine you have 3 sequences in your data: x, y, and z (which appears to be a chimera of x + y). If min-fold-parent-over-abundance (P) = 1, z must be equally or less abundant than x and y. If P = 2, x and y must be at least twice as abundant as z for z to be removed. If P = 8, x and y must be at least 8 times as abundant as z for z to be removed!