Hi,
I am trying to do UPGMA clustering for some data from different countries. I collected these data from different databases.
Finally, I will use the following code;
qiime diversity beta-rarefaction
But, samples from some countries contain a very higher number of reads ( as much as 2.7 million), while samples of some countries have a very low number of reads (like only 9k).
Eventually after denoising the data with Deblur, some samples have a very higher number of feature and other contain very few features.
So, I am planning to rarefy the FeatureTable[Frequency] files to make the feature numbers even. In the qiime feature-table rarefy
code, I will use;
-
--p-no-with-replacement
option for samples with higher features -
--p-with-replacement
option for samples with lower features
Then I will merge the FeatureTable[Frequency] and FeatureData[Sequence] files.
In the next step, I will use these two files to generate a phylogenetic tree artifact by qiime fragment-insertion sepp
.
And then, finally, I will create the UPGMA cluster.
Now my questions are:
- Does rarefy the FeatureTable[Frequency] files is the right decision?
- Is it okay to use different
replacement
options for lower or higher feature containing samples? - I don't find any plugin or command in qiime to rarefy FeatureData[Sequence] files. Do, I need to rarefy FeatureData[Sequence], if I rarefy corresponding FeatureTable[Frequency] files?