Is it necessary to rarefy FeatureData[Sequence] if I rarefy FeatureTable[Freqeuncy]?


I am trying to do UPGMA clustering for some data from different countries. I collected these data from different databases.

Finally, I will use the following code;

qiime diversity beta-rarefaction

But, samples from some countries contain a very higher number of reads ( as much as 2.7 million), while samples of some countries have a very low number of reads (like only 9k).

Eventually after denoising the data with Deblur, some samples have a very higher number of feature and other contain very few features.

So, I am planning to rarefy the FeatureTable[Frequency] files to make the feature numbers even. In the qiime feature-table rarefy code, I will use;

  • --p-no-with-replacement option for samples with higher features

  • --p-with-replacement option for samples with lower features

Then I will merge the FeatureTable[Frequency] and FeatureData[Sequence] files.

In the next step, I will use these two files to generate a phylogenetic tree artifact by qiime fragment-insertion sepp.

And then, finally, I will create the UPGMA cluster.

Now my questions are:

  1. Does rarefy the FeatureTable[Frequency] files is the right decision?
  2. Is it okay to use different replacement options for lower or higher feature containing samples?
  3. I don't find any plugin or command in qiime to rarefy FeatureData[Sequence] files. Do, I need to rarefy FeatureData[Sequence], if I rarefy corresponding FeatureTable[Frequency] files?

Hi @Anisur_Rahman!

I suggest you watch these videos for some discussion rarefaction - this should help clear up some of your confusion here:

