Support with greengenes taxonomy. Output with [ ]

jau · November 15, 2023, 4:16pm

Hi,
I am a beginner in the Qiime2 language. I am writing with two goals.

First, I try to use the new version of Greengenes. Can you revise
my code? My imports are 2022.10.backbone.full-length.fna.qza and 2022.10.backbone.tax.qza.
I read on another post that these files are correct to use the most recent version...

4.1.1

qiime feature-classifier extract-reads
--i-sequences metadata/2022.10.backbone.full-length.fna.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--p-trunc-len 0
--p-min-length 100
--p-max-length 480
--o-reads training-feature-classifiers/greengenes_ref-classif-seqs.qza
--p-n-jobs 8

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads training-feature-classifiers/greengenes_ref-classif-seqs.qza
--i-reference-taxonomy metadata/2022.10.backbone.tax.qza
--o-classifier training-feature-classifiers/greengenes_classifier.qza

qiime feature-classifier classify-sklearn
--i-classifier training-feature-classifiers/greengenes_classifier.qza
--i-reads training-feature-classifiers/greengenes_ref-classif-seqs.qza
--o-classification training-feature-classifiers/greengenes_taxonomy.qza \

qiime metadata tabulate
--m-input-file training-feature-classifiers/greengenes_taxonomy.qza
--o-visualization training-feature-classifiers/greengenes_taxonomy.qzv

4.1.2

qiime feature-classifier classify-sklearn
--i-classifier training-feature-classifiers/greengenes_classifier.qza
--i-reads DADA2/rep-seqs-hits90.qza
--o-classification training-feature-classifiers/taxonomy_greengenes_myseqs.qza
--p-n-jobs 8

qiime metadata tabulate
--m-input-file training-feature-classifiers/taxonomy_greengenes_myseqs.qza
--o-visualization training-feature-classifiers/taxonomy_greengenes_myseqs.qzv

qiime taxa barplot
--i-table DADA2/final-table-hits90.qza
--i-taxonomy training-feature-classifiers/taxonomy_greengenes_myseqs.qza
--m-metadata-file metadata/micro_metadata_def.tsv
--o-visualization training-feature-classifiers/taxa-bar-plots_taxonomy_greengenes.qzv

Why any family results are ? And why can the classifier classify the same genus for different families?

I think that this output can be uncorrected...

Can anyone help me, please?

SoilRotifer · November 15, 2023, 6:52pm

It has become somewhat of an "unofficial standard" to represent putative / unconfirmed taxonomic groupings by surrounding the questionable taxonomic rank with square brackets []. This is simply showing that the taxonomy at the given rank is likely in dispute, or in the process of being formalized. This can lead to inconsistent parent-daughter relationships, as you've noticed.

In this case, the genus g__Prevotella has multiple parent taxonomic ranks, like f__[Paraprevotellaceae] and [f__Prevotellaceae]. That is, the family rank label for the genus Prevotella is likely in flux.

Microbial taxonomy has been in a substantial state of change the last several years, so expect periodic taxonomic inconsistencies.

jau · November 16, 2023, 2:32pm

I am checking my code and I think that I have a fault.

When I did a quality control, I used a old version of reference sequences
from greengenes. My code was:

qiime quality-control exclude-seqs
--i-query-sequences DADA2/rep-seqs.qza
--i-reference-sequences metadata/99_otus.qza
--p-method vsearch
--p-perc-identity 0.90
--p-perc-query-aligned 0.75
--p-threads 32
--o-sequence-hits DADA2/rep-seqs-hits90.qza
--o-sequence-misses DADA2/rep-seqs-misses90.qza

The file 99_otus.qza belongs to greegenes 13.8...
Probably, if it mixes the versions of reference sequences and taxonomies files,
the classification generates artificial disputes.

What do you opine?

SoilRotifer · November 16, 2023, 3:01pm

For the qiime quality-control exclude-seqs... command, you can use other reference files that are located on the Data resources page. For example you can use any of the SILVA sequence files, or the Grenegenes2 "backbone" sequence files.

Note the quality-control exclude-seqs command will not have any affect on your taxonomy. This is simply comparing sequences to a reference database, and filtering those sequences that do not match well enough given your criteria.

system · December 17, 2023, 9:01pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.