QIIME2: Greengene database 2022, OTU used?

I have already run the following codes for v3-v4 of 16s data analysis.

1.5 De novo clustering

qiime vsearch cluster-features-de-novo
--i-table 1_4_table.qza
--i-sequences 1_4_rep-seqs.qza
--p-perc-identity 0.99
--p-threads 36
--o-clustered-table 1_5_table-dn-99.qza
--o-clustered-sequences 1_5_rep-seqs-dn-99.qza

(qiime2-amplicon-2024.10) sujan@DESKTOP-VKEBCR4:~/practice/test-run$ qiime feature-classifier extract-reads \

--i-sequences 2022.10.backbone.full-length.fna.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--p-min-length 300
--p-max-length 500
--o-reads 3_3_ref-seqs.qza
/home/sujan/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
machar = _get_machar(dtype)
Saved FeatureData[Sequence] to: 3_3_ref-seqs.qza

(qiime2-amplicon-2024.10) sujan@DESKTOP-VKEBCR4:~/practice/test-run$ qiime feature-classifier fit-classifier-naive-bayes \

--i-reference-reads 3_3_ref-seqs.qza
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier 3_4_classifier.qza
/home/sujan/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
machar = _get_machar(dtype)
Saved TaxonomicClassifier to: 3_4_classifier.qza

(qiime2-amplicon-2024.10) sujan@DESKTOP-VKEBCR4:~/practice/test-run$ qiime feature-classifier classify-sklearn \

--i-classifier 3_4_classifier.qza
--i-reads 1_7b_rep-seqs-dn-99.qza
--o-classification 3_5a_taxonomy.qza
/home/sujan/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
machar = _get_machar(dtype)
Saved FeatureData[Taxonomy] to: 3_5a_taxonomy.qza

(qiime2-amplicon-2024.10) sujan@DESKTOP-VKEBCR4:~/practice/test-run$ qiime metadata tabulate \

--m-input-file 3_5a_taxonomy.qza
--o-visualization 3_5b_taxonomy.qzv
/home/sujan/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
machar = _get_machar(dtype)
Saved Visualization to: 3_5b_taxonomy.qzv

(qiime2-amplicon-2024.10) sujan@DESKTOP-VKEBCR4:~/practice/test-run$ qiime taxa barplot \

--i-table 1_7a_table-dn-99.qza
--i-taxonomy 3_5a_taxonomy.qza
--m-metadata-file metadata.tsv
--o-visualization 3_6_taxa-bar-plots.qzv
/home/sujan/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
machar = _get_machar(dtype)
Saved Visualization to: 3_6_taxa-bar-plots.qzv

My queries are:

  1. Does these codes run properly??
  2. How can I understand or find out the percentage of OTU used here?

Hi @Sujan,
welcome to the forum. I am not 100% sure if I understand your question 1 correctly. Do you refer to the /home/sujan/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function. This warnings indicates broken support for the dtype! machar = _get_machar(dtype) warnings?
If so, there seems to be an incompatibility between one of Qiime's dependencies, namely numpy, and the WSL (Windows Subsystem for Linux), see BUG: <class 'numpy.longdouble'> does not match any known type · Issue #26414 · numpy/numpy · GitHub or UserWarning of `numpy.longdouble` trigger when importing h5py with numpy>=1.25 · Issue #2357 · h5py/h5py · GitHub
May I therefore ask, if you are operating on a Windows machine?

I am not sure, if results will be negatively affected by this warning. If you provide the input data, I can process then on my linux laptop and send outputs over to you, such that you can make a comparison.

Regarding your second question. I am also not totally sure what you want to achieve. I do see you input a 1_5_table-dn.99.qza, which I assume is a feature-table, i.e. sequences by samples. You then seem to assign taxonomic lables to all the sequences in this table.
In principle, each sequence should then have assigned a label - the question is, more about the taxonomic resolution. Mainly due to database incompleteness, not every sequence will get assigned a known, e.g. Genus or Species. Are you asking for percentage of sequences without a known Genus?!

2 Likes

Hi @Stefan

  1. I use Ubuntu 24.04.1 LTS alongside Windows.
    Input data and codes here:
    Meta-genomic analysis - Google Drive
    If possible give me your codes also.
  2. Does my analysis use 99% OTUs / 90% OTUs, or another similarity threshold? How can I determine this?

Hi @Sujan,
as I could not find your input files 1_4_table.qza nor 1_4_rep-seqs.qza in your shared gdrive folder, I could not re-run your first qiime command. But I ran your second command, namely:

qiime feature-classifier extract-reads --i-sequences 2022.10.backbone.full-length.fna.qza --p-f-primer CCTACGGGNGGCWGCAG --p-r-primer GACTACHVGGGTATCTAATCC --p-min-length 300 --p-max-length 500  --o-reads 3_3_ref-seqs.qza

(which I literally copy and pasted from your post).
The success message is:

Saved FeatureData[Sequence] to: 3_3_ref-seqs.qza

without any numpy warnings. I uploaded the resulting file into your gdrive folder (you should check if anyone with this link is able to upload arbitrary files). Thus, you can compare the content of my 3_3_ref-seqs.qza against yours to assess if the numpy warning affects results.

If you drag and drop the uploaded *.qza file into https://view.qiime2.org/ and switch to the "Provenance" tab, you can also inspect my runtime environment, i.e. list which package versions of e.g. numpy, qiime, ... I used to compute this result.

1 Like

Thanks for you help @Stefan.
I already give you the permission.