I am trying to do UPGMA clustering for some data from different countries. I collected these data from different databases.
Finally, I will use the following code;
qiime diversity beta-rarefaction
But, samples from some countries contain a very higher number of reads ( as much as 2.7 million), while samples of some countries have a very low number of reads (like only 9k).
Eventually after denoising the data with Deblur, some samples have a very higher number of feature and other contain very few features.
So, I am planning to rarefy the FeatureTable[Frequency] files to make the feature numbers even. In the
qiime feature-table rarefy code, I will use;
--p-no-with-replacementoption for samples with higher features
--p-with-replacementoption for samples with lower features
Then I will merge the FeatureTable[Frequency] and FeatureData[Sequence] files.
In the next step, I will use these two files to generate a phylogenetic tree artifact by
qiime fragment-insertion sepp.
And then, finally, I will create the UPGMA cluster.
Now my questions are:
- Does rarefy the FeatureTable[Frequency] files is the right decision?
- Is it okay to use different
replacementoptions for lower or higher feature containing samples?
- I don't find any plugin or command in qiime to rarefy FeatureData[Sequence] files. Do, I need to rarefy FeatureData[Sequence], if I rarefy corresponding FeatureTable[Frequency] files?