when i using classifier-sklearn in qiime2, taxonomy amount is reduced.

Dear Qiime2.
Good Morning
I have some problems assigning taxonomy.
When qiime1 RDP-classifier was used, the amount of genus level was approximately ~200.
After I used the qiime2 classifier-sklearn, the amount of genus level was ~30.
(Qiim2 Database : Greengene, RDP database)

Some Qiime2 forum ask me to check out DADA2 denoising-stats.qza.
I change --p-trunc-len and --p-trim-left options in DADA2.
So, the amount of genus level was slightly increased. This is not ~200 amount.
Also sequence length is ~430 (V3-V4 region)
I know that the V3-V4 region length is approximately 460~480.

Where should I look further?

[Process]

  1. Importing : Paire-end Demultiplexed sequences
  2. CutAdapt : Remove primer
  3. DADA2 : --p-trim-left-f 20, --p-trim-left-r 20, --p-trunc-len-f 280, --p-trunc-len-r 220
  4. Taxonomy profiling : feature-classifier classifier-sklearn Greengene database (13.8, full length)

Thanks in advance.

@kyeong_yun,
The difference here is not actually the taxonomy classification method. The difference here is the rest of the pipeline, and more specifically that qiime1 uses OTU clustering, whereas QIIME 2 has many options but you have used dada2 for denoising.

dada2 is much much better at finding and eliminating bad, noisy sequences from your data. So when comparing qiime1 vs. QIIME 2 results, it is very likely that qiime 1 will have more unique features and more unique taxa... that is a bad thing! Because many/most of those will actually be sequencing error or chimera that is just being clustered into bad OTUs, rather than deing denoised.

RDP and the naive Bayes classify-sklearn method in QIIME 2 generally yield similar results. If you really want to compare the classifiers, you should import your qiime1 data into QIIME 2 and classify, or export your QIIME 2 data and classify with RDP. As it stands, you are comparing apples and oranges because your analysis pipelines are so different.

This sequence length is after you have trimmed your reads, 20 nt from each side, so 430 + 20 + 20 = 470, right in the middle of your expected range.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.