Visualization of large data sets is inadequate and needs optimization

ben · October 25, 2018, 2:17pm

Hi all,

I Hope you're well. One issue we've been coming across is visualization of our bar plots. We have a multi-run set of samples which we've combine and then filtered. When we want to see a set of samples, it is impossible as it crashes multiple browsers (e.g., safari and chrome). Or runs out of the cache memory, this results in safari re-setting the page.

Loading the page isn't an issue, but any manipulation, e.g, looking at different levels lags. Any help with this would be much appreciated.

Sincerely, Ben

thermokarst · October 30, 2018, 12:04am

Hey there @ben! I can't say I am surprised --- that visualization has pretty poor performance benchmarks, and is on the shortlist of Things To Fix here at QIIME 2 HQ (:qiime2:). A few options to make things work a bit better:

Reduce the number features by filtering your feature table
Collapse to a specific taxonomic level prior to plotting - this will require a customized FeatureData[Taxonomy]
Group your samples in your feature table - this will require a new Sample Metadata file
Plot this in another tool

Sorry, none of those options are ideal, but, if you can reduce the size of the feature table, that will help out a bit. Thanks! :qiime2:

ben · October 31, 2018, 1:54pm

Yeah, I think editing the file would be ideal. Maybe we can filter samples of the bar plots after using view.qiime2.org? That would be a great addition to the view options.

thermokarst · November 2, 2018, 3:00pm

Hey @ben,

I'm not sure I follow... view.qiime2.org is just a tool for viewing QZVs (same as qiime tools view in q2cli).

You can filter samples (and/or features) using q2-feature-table - then you can use that filtered table as input to taxa barplot.

ben · November 2, 2018, 3:18pm

@thermokarst Hm, good point, it's a couple of extra steps to filter using Qiime2, I sure you get what I mean, but if you're able to manipulate the samples themselves (e.g., organize by subtype etc.) my specific recommendation may be to actually just show particular samples (show only subtype e.g., only cecal samples). Instead of only view all of the samples at the same time.

If this functionality can be built into the artifacts would be very helpful. Especially in large data sets. Ben

Nicholas_Bokulich · November 5, 2018, 1:41pm

@ben — there are a number of open issues to add the functionality that you describe, e.g., here. So it's on our radar but would require a fairly substantial re-write of this tool — I do not have an ETA but we can report back here when progress occurs.

thermokarst · November 5, 2018, 3:21pm

Also, you can filter independently of this viz using the handful of generic filtering functions found in QIIME 2. The idea here is that we don't need to re-implement filtering in every single viz or method if there are independent, stand-alone filtering functions (see the filtering tutorial for more details).

ben · November 5, 2018, 3:48pm

@thermokarst @Nicholas_Bokulich thanks yeah @Nicholas_Bokulich, that would be good with the bar_plots since I think that functionality is already present in the emperor plots or beta-diversity?

@thermokarst yeah, it's not hard to re-filter and then re-run the code, but if it's all done in one place already, I feel that it's quite redundant to re-do the plots if they're already done in one place. I can certainly filter and re-do the last step for each and every sub-sample/sub-type I have, and it seems that for these bar plots I will have to.

Ben

colinbrislawn · November 5, 2018, 7:40pm

Hello Ben,

I really like your idea of including filtering in the visualizations themselves, so that end users can change their filtering settings and the graphs themselves are "done in one place."

I always filter my bar plots because it greatly increases legibility. This can't be a main text figure with dozens of similar colors! Which is another good reason to include filtering as part of the bar-plot visualization.

Colin

system · December 7, 2018, 1:42am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.