Classify-sklearn low classification depth

Hi @ErikaGanda,
Your problem is a little bit different (the other post concerned use of vsearch specifically, and the sequences were not being assigned any taxonomy, yours are but receive shallow assignments), and so I have split into its own thread.

Unless if your reads are very short/low quality, you should indeed be getting much deeper classification with this classifier! So let's take a step back and examine how these reads were put together — you may be selecting the wrong classifier for your reads, depending on the primers that you are using.

Some other users have reported similar problems, e.g., here. This issue is usually caused by:

  1. The wrong reference sequences are being used (or extracted improperly)
  2. The query sequences are very short or low quality.

So, some questions about your data:

  1. What primers are you using? I notice that for classification you use the Greengenes database with V4 domain extracted using the 515f/806r primers. Is this appropriate for your input sequences?
  2. How long are your query sequences? Did you use QIIME2 for all upstream steps (e.g., dada2 for quality control) or are you importing these reads from elsewhere for taxonomy assignment?

That probably will not help here — if Greengenes is performing poorly, so will SILVA trained on the same amplicon region. SILVA takes around 30X the time to run because the database is much much bigger.

Let us know if the above helps sort out the issue!