I’m trying to produce a file that has the following components (columns):
- Feature ID
- Feature Frequency
- Taxon name
- Classifier’s confidence value
So far I managed to get the above data from two files:
A CSV file from Feature Table, from the first tab on the QZV file that says “Frequency per feature detail” (feature-frequency-detail.csv). This file contains Feature ID and Feature Frequency.
A TSV file from FeatureData[Taxonomy], which is the output of qiime feature-classifier classify-sklearn. This file contains Feature ID, Taxon, and Confidence value.
…but I found that the Feature IDs in these two files were not ordered in the same way all the way to the bottom. Specifically, they match each other when there is only one entry per feature frequency (up to row #234 in the image below), but after that, if there are multiple entries per frequency, there does not seem to be an order? (please see the image below)
Does anyone know if this is just random or if they’re sorted by some rules? If so, what are the rules? (I need to have this information for the downstream analyses).
Thank you so much for your kind help!
I’m so grateful for this forum - you guys are amazing!