q2-gcn-norm: plugin for normalizing sequences by 16S rRNA gene copy number

jwchen · December 3, 2019, 4:55pm

Thanks to the detailed developer documentation, the plugin finally comes out
If you find bugs or have suggestions, please make a post on this forum and on the Github repository.

q2-gcn-norm

QIIME 2 plugin for normalizing sequences by 16S rRNA gene copy number (GCN) based on rrnDB database

Introduction:

This plugin normalizes sequences by 16S rRNA gene copy number (GCN) based on rrnDB database (version 5.6). The script matches the taxa of sequences with the rrnDB-5.6_pantaxa_stats_NCBI.tsv file, starting from the lowest rank. If a match is found, the mean of GCN for the taxon is assigned; if not, the script will try to match a higher rank until the highest rank is met. All the unassigned sequences are assumed to have one GCN.

Note that the mean column in the rrnDB-5.6_pantaxa_stats_NCBI.tsv is, according to the rrnDB manual, calculated from the means of the pan-taxa of immediate lower rank. Therefore, the mean of GCN might be different from the rrndb online search result. For example, the "mean" of GCN for bacteria is 2.02 in the downloading tsv file, whereas the mean of GCN for all the bacterial taxa is 5.0 if you search rrnDB online database.

Install guide:

We assume you have a conda environment with the QIIME 2 Core distribution installed. First, activate the conda environment:

conda activate qiime2-2019.10

Next, install q2-gcn-norm with the following command:

conda install -c jiungwenchen q2-gcn-norm

Running example:

We use artifacts from QIIME 2's "Moving Pictures" tutorial as test files. Use the following commands to download the files.

# DADA2 output artifact:
wget https://docs.qiime2.org/2019.10/data/tutorials/moving-pictures/table-dada2.qza

# Taxonomic analysis output artifact:
wget https://docs.qiime2.org/2019.10/data/tutorials/moving-pictures/taxonomy.qza

We can normalize the FeatureTable using the command below:

qiime gcn-norm copy-num-normalize \
  --i-table table-dada2.qza \
  --i-taxonomy taxonomy.qza \
  --o-gcn-norm-table table-normalized.qza

The output would be an artifact of type FeatureTable[Frequency] % Properties('copy_number_normalized').

Note that the taxonomy format should be like Greengenes' k__foo; p__bar; c__ ... or SILVA's D_0__foo;D_1__bar;D_2__ .... Other formats, e.g. k__foo;p__bar;c__ ...(no space after semicolon) or k__foo|p__bar|c__ ...(use pipe as delimiter), are currently unsupported and will raise error.

Now you can perform analyses as you usually do in QIIME 2 with the GCN-normalized FeatureTable. For example, let's do the ANCOM analysis with the new FeatureTable and compare the result from this example with that from "Moving Pictures" tutorial.

# get the metadata from "Moving Pictures" tutorial
wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2019.10/tutorials/moving-pictures/sample_metadata.tsv"

# ANCOM analysis
qiime feature-table filter-samples \
  --i-table table-normalized.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where "[body-site]='gut'" \
  --o-filtered-table gut-table-normalized.qza
  
qiime taxa collapse \
  --i-table gut-table-normalized.qza \
  --i-taxonomy taxonomy.qza \
  --p-level 6 \
  --o-collapsed-table gut-table-l6-normalized.qza

qiime composition add-pseudocount \
  --i-table gut-table-l6-normalized.qza \
  --o-composition-table comp-gut-table-l6-normalized.qza

qiime composition ancom \
  --i-table comp-gut-table-l6-normalized.qza \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-column subject \
  --o-visualization l6-ancom-subject-normalized.qzv

ANCOM output visualizations:

l6-ancom-subject.qzv (from official tutorial): view
Screenshot:
l6-ancom-subject_normalized.qzv (from this example): view
Screenshot:

You may also want to compare the change in relative abundance using taxonomic bar plots:

Generally, the GCN normalization may not have a huge impact on your analysis results, but someone (e.g. reviewer or, in my case, supervisor) may ask you to do so. For more discussion about GCN normalization, check the related topic in QIIME 2 forum.

thermokarst · December 3, 2019, 5:29pm

Awesome! Please post this to the Library, that is the official place to share (and find) community plugins!

mpodar · December 7, 2019, 2:23pm

Hello

This is a plugin that I was long hoping for, thank you! But there seems to be a problem with the installation, it appears it does not come with some reference files. This is the error that I get when tried to run it:

(qiime2-2019.10) $ qiime gcn-norm copy-num-normalize \

--i-table table.qza \

--i-taxonomy taxonomy.qza \

--o-gcn-norm-table table-normalized.qza

Plugin error from gcn-norm:

[Errno 2] File b'/Users/mpb/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_gcn_norm/rrnDB-5.6_pantaxa_stats_NCBI.tsv' does not exist: b'/Users/mpb/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_gcn_norm/rrnDB-5.6_pantaxa_stats_NCBI.tsv'

These are the files in the q2_gcn_norm/ directory:
$ ls
init.py _copy_num_normalize.py
pycache plugin_setup.py

I manually added the missing file from the github but then this is the error:

--i-table table.qza \

--i-taxonomy taxonomy.qza \

--o-gcn-norm-table table-normalized.qza

Plugin error from gcn-norm:

name 'df_copy_num' is not defined

There seems to be a problem with the plugin?

Thank you!

Mircea

jwchen · December 7, 2019, 3:54pm

Hi @mpodar,

Thanks for reporting these problems! For the first one, I forgot to add the package data in the setup file and I have fixed it. And for the second one, it seems I accidentally deleted some codes and now I put it back. Can you re-install the plugin or copy the new _copy_num_normalize.py from my github and replace the previous one? Please let me know if it works.

Best,
Jiung-Wen

mpodar · December 7, 2019, 8:30pm

hi Jiung-Wen

thanks for the quick response. I deleted and reinstalled though the
conda install -c jiungwenchen q2-gcn-norm

The problem persists. I manually added the .tsv file and replaced the _copy_num_normalize.py with the one from github as you sugegsted. Now again it says it cannot find the .tsv file even though its in the directory its looking at . I wish I was good at python to see where the problem is, but Im not

thanks
Mircea

jwchen · December 8, 2019, 3:32am

Hi @mpodar ,

Sorry for the inconvenience. After inspecting the issue, I found that I failed to upload the new anaconda package last time I have uploaded the new one and now it should work through:

conda install -c jiungwenchen q2-gcn-norm

Alternatively, you can install from github repo. First, clone the repo:

git clone https://github.com/Jiung-Wen/q2-gcn-norm.git

then change to the main directory:

cd q2-gcn-norm/

and run the following command:

python setup.py install

I hope this solves the problem. Please let me know if it persists.

Best,
Jiung-Wen

mpodar · December 8, 2019, 2:12pm

Perfect, now works like a charm, thank you !
Best,
Mircea

przemekiljan · April 7, 2021, 2:28pm

Hello,
Great job with this tool but I worry that it may be slightly outdated considering newly gathered data. Is the reference file for this plug-in updated along with official rrnDB’s data releases, or is there manual way for users to do it by themselves?

Kind regards,
Przemek

jwchen · April 7, 2021, 2:45pm

Hi Przemek,
Thank you for using the plug-in. It has recently been updated to the latest version (v5.7) in Anaconda and I am planing to update the github along with the citation in the near future (probably this week).
https://anaconda.org/jiungwenchen/q2-gcn-norm

[Edit] The github repo has been updated.

Best,
Jiung-Wen

przemekiljan · April 8, 2021, 11:12am

Great! Thanks for quick response

nickbenn · October 11, 2023, 9:43pm

Is it possible to update the plug-in for rrnDB latest version v5.8?

cherman2 · October 11, 2023, 11:09pm

Hi @nickbenn,
Id recommend creating an issue on their github for this: Issues · Jiung-Wen/q2-gcn-norm · GitHub