Eukaryotes (fungi; p__Basidiomycota) in the taxa bar plots of 16S rRNA data

I was not expecting Eukaryotes in my taxa bar plots, however, they appear in many samples. What should I do to improve these plots? Could this be due to the classifier or pre-dada2 steps such as cutadapt or mixed orientation reads? Thank you.

We targeted the V4 region using the following set of primers (AVITI chemistry; 300 bp; PE reads).

V4_515F_Nextera 16S rRNA V4, V4-V6 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA
V4_806R_Nextera 16S rRNA V3-V4, V4 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT

This analysis was done on qiime2-amplicon-2024.10

I used cutadapt to get rid of the primers.

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux.qza \
--p-front-f GTGCCAGCMGCCGCGGTAA \
--p-front-r GGACTACHVGGGTWTCTAAT \
--p-match-read-wildcards \
--p-match-adapter-wildcards \
--p-discard-untrimmed \
--o-trimmed-sequences demux-trimmed.qza \
--p-cores 20 \
--verbose

Next, I employed DADA2 to denoise samples using a range of --p-trunc-len-f --p-trunc-len-r thresholds. The following retained significantly more reads than what I got with --p-trunc-len-f 276 --p-trunc-len-r 260 and other thresholds in between.

qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-trimmed.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 240 \
--p-trunc-len-r 220 \
--p-n-threads 50 \
--o-table dada2-paired-end-table.qza \
--o-representative-sequences dada2-paired-end-rep-seqs.qza \
--o-denoising-stats dada2-paired-end-stats.qza \
--verbose

I used the pre-trained classifier as below:

qiime feature-classifier classify-sklearn \
--i-classifier databases_pre_trained/silva-138-99-nb-classifier.qza \
--i-reads dada2-paired-end-rep-seqs.qza \
--o-classification taxonomy.qza \
--verbose

HI @Irshad,

It is common to amplify Eukaryotes and potentially other off-target taxa in sequencing surveys. There is nothing necessarily wrong with your commands etc... Especially, as there are microbial and mieofaunal eukaryotes i.e. fungi, rotifers, amoeba, etc...

What type of samples are you sequencing? Given the abundance of eukaryotes, I'd suspect that they are indeed part of the environments microbiome... or could be potential contaminants, and/or host reads?

You can follow this tutorial to remove any unwanted / unintended sequences that appear in your data.

3 Likes

Thank you @SoilRotifer. Yes, you have predicted correctly "Given the abundance of eukaryotes, I'd suspect that they are indeed part of the environments microbiome", these samples come from decomposed wood where fungi are abundant and are one of our targets in this study.

On another note, do you expect slight improvement in the taxonomy if I use other silva classifiers such as "diverse weighted Silva 138 99% OTUs full-length sequences" or my own classifier? Does it require a lot of time and computational resources to train a classifier using your tutorial?

1 Like

That is a hard question to answer as it depends on the environment and what lives there. I'd certainly try the weighted classifiers to see if it will help.

Of course you can use RESCRIPt too. Keep in mind the RESCRIPt tutorial is mostly showing what you can do... not necessarily what you should do. But I've had good luck making my own amplicon specific classifiers. For example, I typically dereplicate the full-length data, perform amplicon region extraction, dereplicate the extracted amplicon regions, then perform some basic QA/QC, then train the classifier. Making the amplicon apecific classifier will take less time and reduce the file size and memory footprint of the classifier. :slight_smile:

2 Likes

Hi @Irshad , How did you get the domain and the phylum only into the figure? What did you exclude? Could you share the command you used? thank you