I was not expecting Eukaryotes in my taxa bar plots, however, they appear in many samples. What should I do to improve these plots? Could this be due to the classifier or pre-dada2 steps such as cutadapt or mixed orientation reads? Thank you.
We targeted the V4 region using the following set of primers (AVITI chemistry; 300 bp; PE reads).
Next, I employed DADA2 to denoise samples using a range of --p-trunc-len-f --p-trunc-len-r thresholds. The following retained significantly more reads than what I got with --p-trunc-len-f 276 --p-trunc-len-r 260 and other thresholds in between.
It is common to amplify Eukaryotes and potentially other off-target taxa in sequencing surveys. There is nothing necessarily wrong with your commands etc... Especially, as there are microbial and mieofaunal eukaryotes i.e. fungi, rotifers, amoeba, etc...
What type of samples are you sequencing? Given the abundance of eukaryotes, I'd suspect that they are indeed part of the environments microbiome... or could be potential contaminants, and/or host reads?
You can follow this tutorial to remove any unwanted / unintended sequences that appear in your data.
Thank you @SoilRotifer. Yes, you have predicted correctly "Given the abundance of eukaryotes, I'd suspect that they are indeed part of the environments microbiome", these samples come from decomposed wood where fungi are abundant and are one of our targets in this study.
On another note, do you expect slight improvement in the taxonomy if I use other silva classifiers such as "diverse weighted Silva 138 99% OTUs full-length sequences" or my own classifier? Does it require a lot of time and computational resources to train a classifier using your tutorial?
That is a hard question to answer as it depends on the environment and what lives there. I'd certainly try the weighted classifiers to see if it will help.
Of course you can use RESCRIPt too. Keep in mind the RESCRIPt tutorial is mostly showing what you can do... not necessarily what you should do. But I've had good luck making my own amplicon specific classifiers. For example, I typically dereplicate the full-length data, perform amplicon region extraction, dereplicate the extracted amplicon regions, then perform some basic QA/QC, then train the classifier. Making the amplicon apecific classifier will take less time and reduce the file size and memory footprint of the classifier.