copy number variation (q2-picrust2)

anirban.mcgill · May 12, 2021, 9:45pm

Hello @gmdouglas ,
I hope you are well
I have a question about copy # variation in q2picrust2: In simple language, what do you mean by copy # variation? How do you get the 16S rRNA copy # for my unknown samples? I do not provide that information anywhere in the q2picrust2 command. I only provide as input: my feature table (table.qza), sequences (sequence.qza) and other parameters (nsti, mp etc.)

If you please can help me here, that will be much appreciated.
Thank you,
Anirban

gmdouglas · May 13, 2021, 12:32am

Hey @anirban.mcgill,

The 16S copy numbers are known for all of the reference genomes in advance. So the 16S copy number for each ASV in your samples is predicted, just like the copy number of each other gene family is predicted using PICRUSt2.

Does that make sense?

Cheers,

Gavin

anirban.mcgill · May 13, 2021, 6:28am

Hey @gmdouglas
Thanks for the quick reply! (and sorry on my part for not following up promptly!)

The 16S copy numbers are known for all of the reference genomes in advance.
OK, I believe that is part of the reference DB ? I never knew this was known (can you point me to a link where I can read more, please?)

So the 16S copy number for each ASV in your samples is predicted, just like the copy number of each other gene family is predicted using PICRUSt2.
Predicted using Hidden-state prediction (i.e., Ancestral state recombination)? But then, there does exist the possibility of error versus the actual unknown copy # for 16S rRNA in my samples (species / ASVs) ? Has anybody tried quantifying this error ? I suppose this should not hugely affect our analysis given we do not know the 16S copy # for our samples?

I look forward to your response. Thanks for always being super helpful !

Anirban

gmdouglas · May 13, 2021, 12:03pm

Hey @anirban.mcgill,

Yes it's just treated like the copy number of any other gene.

There definitely is room for error with these predictions - see this independent validation of several 16S copy number prediction approaches that found extremely high variation: Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem | Microbiome | Full Text. We did not specifically compare predictions based on normalized and unnormalized comparisons in our paper (the predictions there were all normalized by predicted 16S copy number), but I can tell you that based on some sanity checks that I ran that the correlations between predicted and actual metagenomes were very similar in both cases.

With the plugin you can include the option --p-skip-norm to skip the 16S copy number normalization step.

All the best,

Gavin

anirban.mcgill · May 14, 2021, 12:41am

Thank you @gmdouglas
I have another pICRUST2-related question but I should put it in a separate thread. When you get time, if you can take a look, that would be great!

Mia_T · December 15, 2021, 8:44am

Hi Gavin

If I did not include the option --p-skip-norm in my script, can I assume that the output from the picrust2 plug-in is already normalised to copy number?

Thanks,
Mia

gmdouglas · December 15, 2021, 2:14pm

Hi Mia,

Yes if you don't use that option then normalization by 16S copy number will be performed.

Cheers,

Gavin