Hi @madeleineernst,
Thanks so much for this contribution - it is great to see that this plugin ready to use! I think this is our first metabolomics plugin as well, which is extremely exciting!
I have a few comments on your tutorial and the plugin that I think might make them even more useful.
First, I tested with QIIME 2 2018.6 and it seems to work. Your tutorial mentions activating 2018.4 - you should be able to update that to 2018.6.
I was a little confused about which files I should be using after downloading the ProteoSAFe-METABOLOMICS-SNETS-5729dd0f-download_cluster_buckettable.zip
file. It would be helpful to note that the two files used in the tutorial will be in that zip file (I had originally thought that I was looking for three files to download based on the three numbered items in your screenshots). You could then include the commands:
unzip ProteoSAFe-METABOLOMICS-SNETS-5729dd0f-download_cluster_buckettable.zip
cp ProteoSAFe-METABOLOMICS-SNETS-5729dd0f-download_cluster_buckettable/METABOLOMICS-SNETS-5729dd0f-download_cluster_buckettable-main.tsv ./GNPS_buckettable.tsv
cp ProteoSAFe-METABOLOMICS-SNETS-5729dd0f-download_cluster_buckettable/networkedges_selfloop/c8a76183cbe644a194408b514ba51632.pairsinfo GNPS_edges.tsv
so that the files you use in your commands have the same names as files the user has in their current directory.
Is possible to provide a command for the user to download the zip file (e.g., using curl
or wget
)? If so, we’ll ultimately be able to automatically test this tutorial for you with new releases of QIIME 2 so that you can be alerted if something breaks.
I notice that you currently import the biom file with a FeatureTable[Frequency]
semantic type. The Frequency
part of that implies that the values in this table are counts, and that this could therefore be used with other QIIME 2 actions that require counts (such as qiime diversity rarefy
, which will subsample the counts in the table without replacement to a user-specified total frequency per sample). If this isn’t the right type, could you describe what these values are, and we can chat about whether a more appropriate type exists or should be created? This will help us prevent users from mistakenly misusing their data (e.g., by trying to rarefy this table if that’s not appropriate). I haven’t worked a lot with metabolome data, so maybe these are actually counts, in which case you should just ignore this comment.
At some point, it would be worth seeing if there would be a better way to pass the --p-css-edges
file. Because it’s specified as a parameter, in a graphical interface the user wouldn’t get a file selection box, so it might be hard or impossible to pass this path as a parameter through a GUI. Making it feature metadata instead should work, but I realize this is pairwise data so it doesn’t exactly match with that concept. That’s probably something we need to think about supporting at the framework level.
Is there a test data set that you could use that would require less computation time? The qiime cscs cscs
step took over an hour to run for me. It’s helpful for testing, and for using these tutorials in workshops, if the test analysis can run quickly (a minute or two at most). That’s of course not always possible though.
I realize this is a lot of info, but just think of this as some general feedback - not a list of urgent to-do items. This is very exciting and useful to have as-is! Thanks for your interest in contributing to QIIME 2!