rarefying qPCR normalized 16S sequences

sabrina_charette · July 22, 2023, 5:19am

Hello!

I have received a .biom file from our sequencing facility - they did some basic bioinformatics on the 16S V3-V4 sequences before sending it to me, including using deblur for denoising and fit-classifier-naive-bayes for feature classifying. I am now using this ASV table and doing some further processing.

One thing we want to do is normalizing our ASV table by our qPCR total bacterial count data (as suggested here). I have already applied the normalization to my ASV table and was able to biom convert and re-import my now normalized ASV table.

Typically when our lab works with relative abundance (i.e. non-normalized) data we rarefy the data before further analysis as suggested here, and this does not give me any issues. However, when I attempt to rarefy using either qiime feature-table rarefy, qiime diversity alpha-rarefaction OR qiime repeat-rarefy repeat-rarefy using my qPCR normalized data, I get this error: (see the exact command and verbose output below)

Plugin error from repeat-rarefy:
repeats may not contain negative values.

I am sure I don't have negative values in my normalized qPCR table, I checked and the minimum values is 0. I was unable to find a similar post in the forum - I'm under the impression using qPCR normalized ASVs (i.e. absolute abundance) is somewhat uncommon.

I am running: qiime2-2022.11 via miniconda3

Here are the commands I've attempted: Note the sampling depth chosen is the minimum frequency per feature in my dataset (generated with qiime feature-table summarize).

qiime feature-table rarefy --i-table abs-filtered-table.qza --p-sampling-depth 135266 --o-rarefied-table abs-filtered-rarefied-table.qza

qiime diversity alpha-rarefaction --i-table abs-filtered-table.qza --i-phylogeny tree.qza --p-max-depth 135266 --m-metadata-file metadata.tsv --o-visualization abs-alpha-rarefaction.qzv

qiime repeat-rarefy repeat-rarefy --i-table abs-filtered-table.qza --p-sampling-depth 135266 --p-repeat-times 100 --o-rarefied-table average_abs_rarefied_table.qza

For the sake of time (each time I run this it take 2+hours), here is the error message I received when I run qiime feature-table rarefy:

results = action(**arguments)
File "", line 2, in rarefy
File "/Users/sabrinaayoub-charette/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/Users/sabrinaayoub-charette/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/Users/sabrinaayoub-charette/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_feature_table/_normalize.py", line 17, in rarefy
table = table.subsample(sampling_depth, axis='sample', by_id=False,
File "/Users/sabrinaayoub-charette/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/biom/table.py", line 2942, in subsample
_subsample(data, n, with_replacement)
File "biom/_subsample.pyx", line 59, in biom._subsample._subsample
File "<array_function internals>", line 180, in repeat
File "/Users/sabrinaayoub-charette/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 479, in repeat
return _wrapfunc(a, 'repeat', repeats, axis=axis)
File "/Users/sabrinaayoub-charette/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
ValueError: repeats may not contain negative values.

Plugin error from feature-table:

repeats may not contain negative values.

See above for debug info.

I have also run this command:

qiime tools validate abs-filtered-table.qza

output: Result abs-filtered-table.qza appears to be valid at level=max.

Thanks for your help!!

crusher083 · July 22, 2023, 9:58am

Hello,

Please attach your initial feature table, that you input into the rarefy command. I will try to reproduce the error.

Cheers
V

colinvwood · July 24, 2023, 5:23pm

Hello @sabrina_charette,

It looks like you somehow erroneously generated some very large feature counts in your feature table, for example one of your features is supposedly present 143521000000 times! Seems like the normalization got messed up. These large numbers then overflow into negative numbers and you get the error you got.

sabrina_charette · July 24, 2023, 6:12pm

Hi @colinvwood ,

thank you for looking into my file!

All I did to obtain these values is multiply my #CFU/gram of stool (from my qPCR) * relative abundance. For example, 4.47E+11 CFU/grams * 0.321075143 (rel. abundance)== 1.43521E+11

Am I doing something wrong with this normalization? Are these CFU values within normal range? Should I be using log_CFU/gram of stool for the normalization instead?

Thank you!!

colinvwood · July 24, 2023, 6:18pm

Hello @sabrina_charette,

I see. And each dna extraction contained one gram of source material? Was the qPCR performed before or after 16S amplification? What did the qPCR target?

sabrina_charette · July 24, 2023, 7:26pm

Hi @colinvwood

each DNA extraction had ~2g of stool, but we normalized the data to 1g before using the data in this relative abundance normalization, so what I multiplied above is the value for 1g of stool.

The qPCR probes we had was: Custom TaqMan™ Gene Expression Assay, FAM. Catalog number: 4331348

We did qPCR AFTER 16S amplification (using primers for the V3-V4 region of the 16S)

Thank you!
S

colinvwood · July 24, 2023, 8:39pm

Hello @sabrina_charette,

By doing the qPCR on amplicons it makes sense that you are going to get really high quantification numbers. I think it's more typical to perform the qPCR on the raw extracted DNA. Since there is really no absolute interpretation of the units (e.g. what does it mean to have 4 x 10^11 amplicons per g?) you could scale all values down by some constant and still preserve the relative relationships. This way your values will fit inside the data types storing them. But really this doesn't add any information that isn't already present from the sequencing data alone.

I don't think this approach allows you to say, for example, "there were X E. coli genomes/g present in sample X" because of the 16S amplification bias. Had you performed the qPCR pre-amplification, this would be possible. This is because the qPCR would have given you the total number of (presumably) 16S copies in the sample, and the NGS data would have given you the relative abundance. (You would still have to take copy number into consideration but this is doable).

sabrina_charette · July 25, 2023, 2:53pm

Thank you so much for explaining this! It makes a lot of sense. I will consider doing the qPCR on the raw extracted DNA, as I agree with you it will give us more useful information.

I appreciate your help!
S

system · August 25, 2023, 8:54pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.