Is there a normalized_table.py in QIIME2?

Hanna · April 7, 2017, 5:38pm

Hi,
I performed normalization of OTU table for downstream analysis using CSS or DESeq.
I know QIIME2 work with sequence variants not OTUs using DADA2.
I wonder if 100%OTUs of the sequence variants do not need the normalization any more.
Do you use rarefying in QIIME2?
Can I use nomalization methods using CSS or DESeq in QIIME2?

Thank you.
Hanna

jairideout · April 7, 2017, 6:00pm

Hi @Hanna! QIIME 2 does not currently support CSS/DESeq normalization methods that are present in QIIME 1. The core development team doesn't have a timeline for adding this support, but we would welcome the addition of these methods in a QIIME 2 plugin if you or someone else is interested in developing that functionality (see our plugin developer docs for details).

I wonder if 100%OTUs of the sequence variants do not need the normalization any more.

I think they'll still need normalization because samples could still have vastly different sequence counts, which is what needs to be normalized.

Right now you have a few options to choose from:

Rarefy your table with qiime feature-table rarefy.
If you are interested in differential abundance testing in QIIME 2, take a look at the qiime composition plugin, which implements ANCOM (it has its own normalization scheme).
Run CSS/DESeq in QIIME 1 (or R) and import your normalized .biom file into QIIME 2 (see the importing tutorial).

Hanna · April 10, 2017, 12:07am

Thank you for your kind explanation.

If I want to perform normalization,
Demultiplexed sequences --> QC with DADA2 in QIIME2 --> Exporting into QIIME1 (.biom) --> Nomalization with CSS in QIIME1 --> Import into QIIME2 --> Downstream analysis in QIIME2..
Is it right?

I have one more question about the normalization.
If my data have vastly different sequence counts and need to be normalized using CSS,
When should I start using the normalized data?
At alpha diversity?
At beta diversity?
At comparing abundance differences e between two groups of my samples?
All of them?
If it is not a relevant question in QIIME2 forum then I apologize.

Thank you.
Hanna

jairideout · April 10, 2017, 5:07pm

If I want to perform normalization,
Demultiplexed sequences --> QC with DADA2 in QIIME2 --> Exporting into QIIME1 (.biom) --> Nomalization with CSS in QIIME1 --> Import into QIIME2 --> Downstream analysis in QIIME2..
Is it right?

That should work!

I have one more question about the normalization.
If my data have vastly different sequence counts and need to be normalized using CSS,
When should I start using the normalized data?
At alpha diversity?
At beta diversity?
At comparing abundance differences e between two groups of my samples?
All of them?
If it is not a relevant question in QIIME2 forum then I apologize.

I've reached out to other developers who will be able to answer your normalization questions. I'm not sure what downstream methods are applicable/appropriate to CSS-normalized tables (in QIIME 1 or QIIME 2).

mortonjt · April 10, 2017, 6:12pm

When should I start using the normalized data?
At alpha diversity?
At beta diversity?
At comparing abundance differences e between two groups of my samples?
All of them?

You should start using normalized data before you start running your analyses. So this should be done prior to alpha diversity (unless you are dealing with rarefaction curves) and beta diversity. Of course there are exceptions, there are alpha/beta diversity + differential abundance tests that can work directly on unrarified data. Which is why it is important to note the input types to the methods.

For instance if you run alpha diversity

tests-MacBook-Pro-4:d3_tutorial mortonjt$ qiime diversity alpha --help
Usage: qiime diversity alpha [OPTIONS]

  Computes a user-specified alpha diversity metric for all samples in a
  feature table.

Options:
  --i-table PATH                  Artifact: FeatureTable[Frequency] %
                                  Properties(['uniform-sampling'])  [required]
                                  The feature table containing the samples for
                                  which alpha diversity should be computed.
  --p-metric [gini_index|esty_ci|menhinick|doubles|fisher_alpha|chao1_ci|simpson_e|simpson|enspie|mcintosh_e|lladser_ci|pielou_e|osd|robbins|goods_coverage|ace|singles|strong|lladser_pe|dominance|observed_otus|berger_parker_d|chao1|shannon|mcintosh_d|heip_e|brillouin_d|michaelis_menten_fit|margalef|kempton_taylor_q]
                                  [required]
                                  The alpha diversity metric to be
                                  computed.
  --o-alpha-diversity PATH        Artifact: SampleData[AlphaDiversity]
                                  [required if not passing --output-dir]
                                  Vector containing per-sample alpha
                                  diversities.
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config PATH               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --help                          Show this message and exit.

You'll notice that it'll take in FeatureTable[Frequency] % Properties(['uniform-sampling']) which indicates that a FeatureTable[Frequency] that has been normalized (i.e. Properties(['uniform-sampling'])) needs to be passed in. In this case, rarefaction is the only normalization scheme that is available. But as @jairideout mentioned, we will welcome contributions to enable other normalization schemes.

Concerning CSS, I never really understood the need for it -- it doesn't make sense for me to skew the proportions of the reads assigned to each OTU. I've CCed someone else who is more familiar with this scheme who may shed some light on this.

IrishSetter · April 12, 2017, 2:44pm

CSS attempts to correct for potentially skewed proportions of reads. There is some evidence to suggest that certain sequences may be preferentially sequenced and therefore occupy a higher proportion than is reflected in the environment.

CSS-normalized data should not be used to compute alpha diversity, though the other analyses you listed are ok with CSS-normalized data. If you are using unweighted beta diversity metrics, please check for original library size effects in e.g. your PCoA plot.

MMC_northS · March 9, 2018, 9:29am

Hello,

really I do not understand how to get a normalized table in qiime2 that I can export and use in other analyses.

I do not understand the 'qiime feature-table rarefy' plugin, because I do not know what do you mean with "The total frequency that each sample should be rarefied to. Samples where the sum of frequencies is less than the sampling depth will be not be included in the resulting table. [required]"

Do not exist any command like "single_rarefaction" in which you used directly the depth of reads that you want to use?

If CSS and DESeq are not in QIIME2 currently, that means that they will have support in QIIME1?

The ANCOM method seems to be for comparing the abundances in groups of samples, so it is possible to use it also for normalizing?

Sorry for any questions. The normalization is important to me because when we have several samples in the same run not all samples have similar number of reads, so I need the normalization to be ensure I analyse the samples without bias related to differences in number of reads.

Thank you for your help in advance!

Nicholas_Bokulich · March 9, 2018, 4:06pm

Sounds like you are looking to rarefy your feature table. qiime feature-table rarefy is indeed what you are looking for. However, this is also automatically built in to the core diversity workflows (qiime diversity core-metrics and core-metrics-phylogenetic) that actually use rarefied tables. You do not need/want to use these rarefied tables for other steps outside of diversity, so using core metrics is usually more convenient than doing the rarefying as a separate step. You simply select your rarefying depth with the --p-sampling-depth parameter.

in qiime feature-table rarefy and in qiime diversity core-metrics, the parameter sampling-depth == the parameter depth used in qiime1's single_rarefaction.py. It is the number of sequences that you wish to randomly subsample from each sample. If a sequence has fewer reads, it must be excluded (after all, if uneven sampling is the issue, you want to drop these from your analysis)

qiime feature-table rarefy

qiime1 is a totally separate entity, so yes they are available. But qiime1 is no longer officially supported (i.e., no one moderates the qiime1 forum any more).

No

I hope that clarifies how to use qiime feature-table rarefy! Let us know if you have any questions.