"Same" database, different outputs

Thanks for the help!

I had read the advice to trim the database from the following link:

To trim or not to trim

One issue with ITS (and other marker genes with vast length variability) is readthrough, which occurs when read lengths are longer than the amplicon itself! The polymerase will read through the amplicon, the primer, the barcode, and on into the adapter sequence. This is non-biological DNA that will cause major issues downstream, e.g., with sequence classification. So we want to trim primers from either end of the sequence to eliminate read-through issues. Enter cutadapt. Note that we trim the forward primer and the reverse complement of the reverse primer from the forward reads (the forward primers have already been trimmed in the raw reads, but we will demonstrate forward + reverse trimming here since attempting to trim the forward read will not hurt). We trim the reverse primer and reverse complement of the forward primer from the reverse reads.

In this analysis, I am interested only in the classification of Fungi.

Should I therefore eliminate the "not Fungi" from a classification made with an All Eukaryotes Database despite the lower precision?

For sequences classified as "k__Fungi; __; __; __; __; __; __" can I consider them correctly classified as Fungi, or should we consider them as possible random matches and merge them with unidentified?

Thanks!

1 Like