Lack of taxonomy assignment at genus level69

Hi! Im working on 16S amplicons (V3-V4 region), trying to analyse microbial comunity structure from polluted sediments of an Argentina's river. In a preliminar stage we took a single sample, extract gDNA, amplified it and sequenced it by Illumina (this last step was provided by Mr Dna).
Ive analyze raw data and Ive found lot of problems because forward and reverse reads were mixed in R1 and R2 files. Since i needed to make a single end analysis ive separate forward and reverse reads with cutadapt. Then ive to discard forward reads included in R2 file (because they showed low quality), and used forward reads from R1 file only. In denoising step (qiime dada2 default parameters) nearly 50% of sequences were lost.
For taxonomy assignment step, ive trained a classifier for V3-V4 region (based on Silva 138.1) using Rescript. The result was acceptable since ive found a community structure similiar to other polluted sites from the same river. But for this sample, ive found a almost lack of family or genus assignment (of course any species assignment as well). This seems rare to me thus ive worked with similar samples in the past (but using dada2 or mothur) and ive reached that taxonomy assignment level.

Could you help me thinking why is this happening?

Ive thought about two possible explanations:
1- Sequence quality is too low
2-Ive trainned my classifier in a wrong way

Thank you so much!! I really appreaciate any kind of orientation.

1 Like

Hello Celeste, :wave:

Thank you for this very detailed description of your methods. I can see one area that would cause problems.

While you have amplified and sequenced V3-V4 and trained a classifier on V3-V4, the reads you are currently using are shorter.

Because you are only using the R1 reads, I expect this would include only part of V3, so the extra V4 region included in the classifier may be reducing your resolution, in this specific case.

When using only R1 reads from V3, what assignment level do you expect?

Thank you for this quick response!
Id like to reach genus level. We are working on bioleaching processes on this samples and we know this process is carried out by microorganisms. We isolated some, but I suspect they are not in the mayority either in environment or in the reactors.

P.s.: indeed forward reads have nearly 270/280 bp

Should I train my classifier on V3 region only?

Update1: ive trainned my classifier on V3 using 341F and 517R... Results were a bit strange ive found Eukarya and genera that ive never seen before...

1 Like

Hello, just use novel GreenGenes2 database, it will be easier than training specific classifier: Introducing Greengenes2 2022.10


Thank you so much for your response! Ive got some questions:

Previous work in my lab was based on Silva. Thats why im trying to work on this database. Using Greengenes,
Will result in a different outcome?

Moreover, I didn t use Silva full length classiffier because the process was killed due to lack of RAM.

Could Greengenes database help me to get around this problem?

Which could be the difference between using one or another based on computer processong needs and/or results quality?

Thank you!!


I didn't specifically measure GG2 memory footprint, but it is definitely better than running SILVA on few cores. GG2 incorporates SILVA, as you might read in the paper :wink:

SILVA wasn't updated for 3 years currently. You will gain more info from reclassification, but you will need to reclassify everything done before. IMHO it is relatively a small burden if you want to reanalyse previously generated data with up-to-date methods.