HM Thinking: Visual Analytic

Showing posts with label Visual Analytic. Show all posts

Tuesday, January 3, 2017

Attention and visual memory in visualization and computer graphics

Note

A survey paper discusses the attention and visual memory in computer visualization. It first discusses the effect of preattentive processing, which is quick, pop-out and parallel processing (versus serial processing). The theories of preattentive included:

Feature Integration Theory: selective perception, classify preattentive features through brand cells. some feature can parallelly detect the features.
Texton Theory: Elongated blobs (lines, rectangles or ellipse, etc.), Terminator (end of line segments), Crossings of line segments.
Similarity Theory: structure units that share a common property, with limited short-term visual memory, a closer structure is with more information to process.
Guided Search Theory: the top-down or bottom-up visual search.
Boolean Map Theory: consider information location, to process and held the pattern in memory to search the target.
Ensemble Coding: guide attention in a large scene, to catch the ensemble difference.
Feature Hierarchy: most important data should be highlight by color or other visual features.

The second section of the paper discussed the visual expectation and memory.

Eye Tracking: eye gaze pattern analysis, the eye would repeatedly track the visual information if no preattentive information pop out.
Postattentive Amnesia: conjunction features which with no preattentive effect, i.e. cannot be semantically recognized and remembered. This can be done by traditional search or postattentive search.
Attention guided by memory and prediction: viewer finds a target more rapidly for a subset of the display that is presented repeatedly. Second, the unconscious tendency of a viewer to look for targets in novel locations in the display.
Change blindness: the feature that users can not be detected even the user actively search for it, e.g. compare two picture, one with modification.
Inattentional blindness: the user can completely fail to perceive visually salient objects or activities, e.g. the gorilla inattentional blindness experiment.
Attention Blink: the limited ability in users' ability to process information that arrives in quick succession even when that information is presented at a single location in space.

The vision models:

Visual Attention: perceptual salience (e.g. number of colors, is the visualization perform as expected?), predicting attention (predict where a viewer will focus their attention), directing attention (to catch the eyeball).
Visual Memory: to make sure user not miss the important information to avoid the change blindness and inattention blindness effect

Current challenges:

Visual Acuity: what is the information-processing capacity of the visual system?
Aesthetics: understand the perception of aesthetics
Engagement: consider the factor of visual interaction, decision.

Reference

Healey, Christopher, and James Enns. "Attention and visual memory in visualization and computer graphics." IEEE Transactions on Visualization and Computer Graphics 18.7 (2012): 1170-1188.

Monday, January 2, 2017

Empirical studies in information visualization: Seven scenarios

Note

A useful reference of visual tool evaluation. The paper provides 7 scenarios that research can easily follow to conduct the user study.

Understand Environments and Work Practices (UWP)
Evaluating Visual Data Analysis and Reasoning (VDAR)
Evaluating CommunicationThrough Visualization (CTV)
Evaluating Collaborative Data Analysis (CDA)
Evaluating User Performance (UP)
Evaluating User Experience (UE)
Evaluating Visualization Algorithm (VA)

Reference

Lam, Heidi, et al. "Empirical studies in information visualization: Seven scenarios." IEEE Transactions on Visualization and Computer Graphics 18.9 (2012): 1520-1536.

A nested model for visualization design and validation

Note

4 layers nested model to analyze and evaluate the visualization design. The layers are:

Domain problem and data characterization: the designer should follow the "vocabulary" in each domain, e.g. business or biology.
Operation and data type abstraction: data type transformation
Visual encoding and interaction design: the cost of interaction
Algorithm Design: run-time speed and time

To evaluation:

Vocabulary: to discuss the terminology in different domains
Interactive Loops and Rapid Prototyping: looping and refining.
Domain Threats: mischaracterized problem
Abstraction Threats: not solve the characterized problem the target users.
Encoding and interaction Threats: not effective communication.
Algorithm Threats: memory performance.

Reference

Munzner, Tamara. "A nested model for visualization design and validation." IEEE transactions on visualization and computer graphics 15.6 (2009): 921-928.

A design space of visualization tasks

Note

A taxonomy for data visualization tasks. The author defines the design space dimensions as:

Goal: Exploratory Analysis (e.g. undirected search), Confirmatory Analysis (directed search), Presentation (exhibiting confirmed analysis results)
Means: Navigation (e.g. browsing or searching), (Re-)organization (e.g. extraction, abstraction), Relation (e.g. variations, discrepancies)
Characteristics: Low-level (e.g. values, objects) & High-level (e.g. trends, outliers, clusters, frequency, distribution, correlation, etc.) data characteristics
Target: Attribute Relations (e.g. Temporal and Spatial relations), Structural relation (e.g. causal relations, topological relations)
Cardinality: Single (highlight detail), Multiple (putting data into context), and All Instances (getting the overview).

The classification can be used as the semantic tuple, i.e. (exploratory, search, trend, attrib(variable), all). This tuple is used to calculate the suitable techniques.

Reference

Schulz, Hans-Jörg, et al. "A design space of visualization tasks." IEEE Transactions on Visualization and Computer Graphics 19.12 (2013): 2366-2375.

Interactive dynamics for visual analysis

Note

A taxonomy of tools that support the fluent and flexible use of visualizations.

Pay attention more to Coordinate and Organize sections.

Reference

Heer, Jeffrey, and Ben Shneiderman. "Interactive dynamics for visual analysis." Queue 10.2 (2012): 30.

Task taxonomy for graph visualization

Note

A graph-specific visualization consists of Nodes, Links, Paths, Graphs, Connected Components, Clusters, and Groups. This paper discussed the possible tasks to examine the tool based on the given objects.

The low-level tasks, included:

Retrieve value
Filter
Compute the Derived Value
Find Extremum
Sort
Determine Range
Characterize Distribution
Find Anomalies
Cluster
Correlate

Tasks which commonly encountered while analyzing graph data:

Topology-based Tasks: adjacency (direct connection), accessibility (direct or indirect connection), common connection, connectivity
Attribute-based Tasks: On the Nodes, On the Links
Browsing Tasks: Follow path, Revisit

Some more high-level tasks:

compare two web graph for the difference, e.g. two recipe graph.
nodes duplication
some tasks need users' interpretation

Reference

Lee, Bongshin, et al. "Task taxonomy for graph visualization." Proceedings of the 2006 AVI workshop on BEyond time and errors: novel evaluation methods for information visualization. ACM, 2006.

Sunday, January 1, 2017

Design considerations for collaborative visual analytics.

Note

This paper discussed the factor to consist a collaborative visual analytics environment. Some of the theory is overlapping with the online community operation. A successful collaboration is an effective division of labor among participants, the author argue three factors here: modularity, granularity, and cost of integration. In other words, the tasks should split, conduct and integrate at a reasonable price. If each of the factors is too expensive, it may hard to be a success collaboration scenario. For modularity factor, the author provides an information visualization reference model; this model helps for decomposing the visualization process into data acquisition and representation visual encoding, display, and interaction. Each of the components can be a reasonable module to start the collaborative works. For granularity factor, the author discussed the sensemaking model, for instance, in cooperative scenarios, the collaborator can immediate benefit from the actions of others. It is hard to facilitate cooperation if a lack of the incentive.

The ground sense principle is listing below:

discussion models, awareness
Reference & deixis, pointing
Incentives & engagement, personal relevance, social-psychological incentives, gameplay,
Identity & trust & reputation, identity presentation
Group dynamics, management, size, diversity
Consensus and decision making, information distribution & presentation

A good reference to consider the collaborative theory in different scenarios, e.g. business intelligence system. For social analysis, a extend reading at [2].

Reference

Heer, Jeffrey, and Maneesh Agrawala. "Design considerations for collaborative visual analytics." Information visualization 7.1 (2008): 49-62.
Wattenberg, Martin, and Jesse Kriss. "Designing for social data analysis." IEEE transactions on visualization and computer graphics 12.4 (2006): 549-557.

egoSlider: Visual analysis of egocentric network evolution.

Note

This paper proposes a tool to visualize the dynamic and temporal information of ego-network. The primary goal of this tool is to support the study of the exploratory pattern for cross domains. For instance, how the ego-network change among time to the relationship with personal health? The contribution lay in three layers: 1) macroscopic: summarize the entire ego-network; 2) mesoscopic: overviewing particular individuals' ego-network evolution; 3) microscopic: displaying detailed temporal information of egos and their alters.

The visualization idea may come from different discipline, e.g. the sociology research may focus on more social interaction with developed social theory. It may be a great contribution to design such a tool to help them better facilitate, utilize and digest the generated data.

Reference

Wu, Yanhong, et al. "egoSlider: Visual analysis of egocentric network evolution." IEEE transactions on visualization and computer graphics 22.1 (2016): 260-269.

Reducing snapshots to points: A visual analytics approach to dynamic network exploration.

Note

This paper uses the dimensional reduction technique to reduce the complex, multi-dimensional graph into points as 2-dimension plot. It shows the pattern with a different cluster, the user can further explore the generated points to see the detail of the network.

This may help the user to understand the deep learning through neural network, the feature extraction process. But the challenge is still remaining how to explain/label the projection cluster. It is not guarantee to have a meaningful (or at least human understandable) pattern in each round of exploration.

Reference

van den Elzen, Stef, et al. "Reducing snapshots to points: A visual analytics approach to dynamic network exploration." IEEE transactions on visualization and computer graphics 22.1 (2016): 1-10.

Information visualization and visual data mining

Note

A good survey paper to follow the trend of data visualization and mining. This paper provides a clear classification for visual data mining works. The author describes: "The visual data exploration process be seen a hypothesis generation process". A visualization interface provides the user an overview of the dataset. Based on the insight, the user can explore/filter/verify the finding to answer the hypothesis, the hypothesis can be generated by user/statistics/machine learning. In another hand, a visual data exploration usually follows a three looping process: overview, filter, and detail-on-demand. The different insight will jump out while the user explores the data through designed interface.

A visual data mining has consisted with three components: 1) data type to be visualized: 1D, 2D, ND, Text and hypertext and algorithm data visualization; 2) visualization technique: standard 2/3D, geometrically transformed, icon-based, dense pixel and stacked display; 3) interaction and distortion technique: projection, filtering, zooming, interactive distortion, linking and brushing. Each categories is with a reference paper that worth to further reading.

Reference

Keim, Daniel A. "Information visualization and visual data mining." IEEE transactions on Visualization and Computer Graphics 8.1 (2002): 1-8.

Friday, December 30, 2016

CiteRivers: visual analytics of citation patterns

Note:

This visual tool is aim to help explore the citation network of the given publications (conference proceeding). It shows the citation word cloud, trend, diversity, author and publisher venue.

Points:

Not clear of the scale of stream panel and the relation with spectral clustering. The benefit of using clustering techniques to show the publication in a river style is not clear.
An across stream citation analysis would be useful, i.e. to select more than one cell of the river.
The word meaning in the word cloud may be varied. E.g. the network is with multiple meaning across different research, even in the same domain.
The user case showed the citation pattern of given IEEE publications, but lack of the discussion of the found pattern. This may be the key value to the target users.
A user case that may be interesting: The given year publications are major based on which year's work? This could be a influence index for the past works (also the scholar).

VIS15 preview: CiteRivers: Visual Analytics of Citation Patterns from VGTCommunity on Vimeo.

Reference:

Heimerl, Florian, et al. "CiteRivers: visual analytics of citation patterns." IEEE transactions on visualization and computer graphics 22.1 (2016): 190-199.

A visual analytics agenda

Note

This paper points out the potential research directions for visual analytics.

let user obtain deep insight, assessment, planning and decision making.
let user see, explore and understand large amounts of information simultaneously
convert all types of conflicting and dynamic data in ways that support visualization and analysis.
communicate the information in the appropriate context to a variety of audiences.

The science of analytical reasoning, take a crisis event as example.

understanding historical and current situations.
identifying possible alternative future scenarios
monitoring current events to identify both expected and unexpected events.
determining indicators of the intent of an action or an individual.
support the decision maker in times of crisis.

visual representations and interaction technologies

facilitate understanding of massive and continually growing collections of data of multiple types.
provide frameworks for analyzing spatial and temporal data
support the understanding of uncertain, incomplete, and misleading information.
provide user and task-adaptable guided representations that enable full situation awareness while supporting development of detailed actions.
support multiple levels of data and information abstraction, including integration of different types of information into a single representation.

Data representations and transformations

transforming data into new scalable representations that faithfully represent the underlying data's relevant content.
synthesize different types of information from different sources into a unified data representation, so users can focus on the data's meaning in the context of other relevant data
develop methods and principles for representing data quality, reliability and certainty, measure through-out the data transformation and analysis process.

Reference

Thomas, James J., and Kristin A. Cook. "A visual analytics agenda." IEEE computer graphics and applications 26.1 (2006): 10-13.

Effectively Communicating Numbers & Tapping the power of visual perception

Note

A good introduction white paper about how to present the quantitative information for business. It is a good basic reading material [1]. The data visualization is used to enable the "visual perception" of the user, "human visual system is a pattern seeker of enormous power and subtlety". In other hand, if the data present in different way, the user may not be able to catch the invisible patterns, which makes the less effective communication.

The human eye is catching light and translate them into color and thoughts [2]. The light shines on the fovea area would be highlighting to catch more attention. The other parts of retina may with less detail, but the capable and ready to catch any point of changes, e.g. something moving or pop-up. Besides, the human brand is with long and short term memory. The short term memory is processing and discarding the received information, like a RAM in computer. The speed is quick but with very limited capacity. The long term memory requires more time to organize but can last longer for later use, like a hard drive. It would be crucial to design the visualization follow the nature of human brand preference.

There are two kind of attention of visual perception, pre-attentive and attentive. The pre-attentive is processing very quick and parallel, like pop up in your eye. For instance, a serial number with highlighted target number. The highlighted number would jump out the serial of number for the user to recognize. The preventative attribute can only be accurately to encode number in 2D locations, e.g. 2D scatter plots. More than 2D would turn the display into attentive process, which requires more time and serial process effort. One exception is to use the colored points for categorical distinguish. Hence, a 2D scatter plot with color categorical may be the best use case for user to understand the data.

Reference

Few, Stephen, and Perceptual Edge Principal. "Effectively Communicating Numbers." Principal Perceptual Edge. White Paper. Downloaded from (2005).
Few, Stephen. "Tapping the power of visual perception." Visual Business Intelligence Newsletter (2004).

Google+ ripples: A native visualization of information flow

Note

A nested circles style to present the temporal pattern of re-sharing. The sharing action is structure as tree-map. The nested circle helps to highlight the cluster in each branch. This paper discusses the design factors included: social media sharing pattern, rendering, interaction and animation. I think it would be a useful way to tell the story about the temporal, social network trends. The display is bright for the user to understand the whole picture of the certain topic or post to spread.

An extend reading of the nested circle of [2]. The paper models the exploratory search tasks as a radar plot. The user can drag the interested item into the plot to filter the result. In [1], the figure helps to show the social media sharing pattern as circles, however, in [2], from a different perspective, to help the user to filter the result. The two scenario may mutually relevant.

Reference

Viégas, Fernanda, et al. "Google+ ripples: A native visualization of information flow." Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.
Kangasrääsiö, Antti, et al. "Interactive Modeling of Concept Drift and Errors in Relevance Feedback." arXiv preprint arXiv:1603.02609 (2016).

Thursday, December 29, 2016

The structure of the information visualization design space

Note:

This paper provided a framework to organize and structure the visualization plots. It considers the following features:

Data Type: Nominal, Ordinal, Quantitative, Intrinsically Spatial, Geographical, Set mapped to itself
Function for recording data: filter,sorting,multidimensional scaling,interactive input ot a function
Recorded Data Type: same as Data Type
Control Processing : tx (text)
Mark Type: point,line,surface,area,size
Retinal properties: color, size, connection, enclosure
Position in space time: position in space time, N (Nominal) O (Ordered) Q (Quantitative)
View transformation: ::=nb (hyperbolic mapping)
Widget: slider, radio buttons

For example: Multi-Dimensional Tables

Points: 1) many of the visualization is not web-based. Is there any particular reason to use web standard? 2) if the web-based visualization, what is the framework different? e.g. the web-based application may using more mouse gesture to click, scale and hover. Or, with help of useful libraries like D3.js, how does it influences the implementation of data visualization? 3) the design space for non-web-based applications are more open and less limitation, but accessibility is weak to share and collaborative.

Worth to read more: [2], [3], [4] for the web-based space of data visualization.

Reference:

Card, Stuart K., and Jock Mackinlay. "The structure of the information visualization design space." Information Visualization, 1997. Proceedings., IEEE Symposium on. IEEE, 1997.
Figueiras, Ana. "A Typology for Data Visualization on the Web." IV 13 (2013): 351-358.
Turetken, Ozgur, and Ramesh Sharda. "Visualization of web spaces: state of the art and future directions." ACM SIGMIS Database 38.3 (2007): 51-81.
Brath, Richard, and Ebad Banissi. "Using Typography to Expand the Design Space of Data Visualization." She Ji: The Journal of Design, Economics, and Innovation 2.1 (2016): 59-87.

A Tour through the Visualization Zoo

Note

This paper introduced the basic figure plots for data visualization. The mentioned schemes included:

Time Series Data: Index Chart

Time Series Data: Stacked Graph

Time Series Data: Small Multiples

Statistical Distribution: Horizon Graph

Statistical Distribution: Stem-and-Leaf Plot

Statistical Distribution: Q-Q Plots

Statistical Distribution: Scatter Plot

Statistical Distribution: Parallel Coordinates

Maps: Flow Map

Maps: Choropleth Map

Hierarchies: Node-Link

Adjacency Diagrams: Lcicle Tree Layout

Adjacency Diagrams:Enclosure Diagrams

Network: Treemap

Network: Nested Circles

Network: Force-directed Layout

Arc Diagram

Matrix View

Reference

Jeffrey, Heer, Bostock Michael, and Ogievetsky VADIM. "A Tour through the Visualization Zoo." Communications of the ACM 53.6 (2010): 56-67.

High-dimensional data visualization

Note:

This paper introduced the basic figure plots to display the multi-dimensional data. The mentioned schemes included:

Mosaic Plots

This plot is good for categorical data display, for the user to compare the different between features. But it requires the user to pay attention to multiple directions (top/bottom, left/right), which makes it harder to follow, less user perception. Besides, this plot provides a quick overview categorically, but for ordinal and interval variables.

Trellis Displays

Nice to provide a comparison between variables, not suitable for temporal data and categorical data. Besides, many of the cells may repeating or empty.

Parallel Coordinate Plots

Nice to show the temporal data, requires the skill to solve the overplotting, scaling and sorting problems.

Projection Pursuit and the Grand Tour

Not easy for the human brand to process a 3D plot, but it shows the dynamic between the dimension projection. For instance, using a scatterplot with 3 dimensions, let the user explore the pattern across dimensions, is one type of grand tour.

Summary

A summary with the functionality of exploration and presentation included the interactivity of each plot. However, I think the Trellis may also provide interactively, e.g. this demo.

Reference

Theus, Martin. "High-dimensional data visualization." Handbook of data visualization. Springer Berlin Heidelberg, 2008. 151-178.

Wednesday, December 28, 2016

Collaborative visual analysis with RCloud

Note

This paper discussed a collaborative visual analysis environment for a team work. For a data science related project work, it is very common to design, analyze and deliver the result to target audience, could be a colleague, customer or your boss. This is a process of exploratory data analysis (EDA). This paper argues the works are usually done by different tools, i.e. coding in scripting language and design the interface with web techniques. This makes the collaborative work very difficult, due to lack of discoverability (code reuse), technology transfer (collaborate) and coexistence (plus interactive visualization tool). Hence, this paper proposed a framework - RCloud, which using R to integrate the back end analyze and front display in a restful API structure. The basic idea is every application natively demonstrates the result to users through web browsers. This framework is re-using and coupling the existing package in R.

Points: in a small scale teamwork size and low dynamic of project requirements, I think this framework would work well. However, if more and more projects (usually small and not mature result) go live, the search and re-use may create extra workload for the developer. In another hand, the R package may not be suitable to solve all the practical problems, e.g. a large scale data storage or distributed computing tasks. Besides, there are more framework options to better facilitate the collaborative between developer and designer, e.g. the MVC framework. I think a good framework should stand alone with the specific language and techniques, so it can generally support to dynamic real world requirement.

I actually like this idea, it shows the values to deliver the beta works to the users. It 'd be good if we can put the research finding or preliminary result on the web for a better potential collaborative, public exposure, and self-advertisement. The other trend is using Scala to bundle the analysis, implementation, and production.

Reference

North, Stephen, et al. "Collaborative visual analysis with RCloud." Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on. IEEE, 2015.

EgoNetCloud: Event-based egocentric dynamic network visualization

Note

A quality work on network visualization, this paper proposed a visual analytic tool to display the structure and temporal dynamics of an egocentric dynamic network [1,3]. It considered three important design factors in this work: 1) network simplification: to show all the links in the network graph is meaningless and over the information loading for users. A reasonable way to "prune" the node to highlight the important nodes is necessary. It firstly defined the weighting function by co-author number and ordering. Based on the weighting function, the authors tried four different approaches to pruning the node, to maximize the efficiency function, which maxes the weighting in the sub-graph.
2) temporal network: the temporal information present by horizon graph with an axis of time. It would be a simple task to identify the distribution over time; 3) graph layout: the layout designs with a 2D space. Due to the temporal relationship, the chart divides into several sub-graph that hard to fit by regular force-directed graph layout. They extend the stress model to calculate the ideal design [2].

Points: 1) the research methodology of visual analytic: from design, implantation, case study to user study. The user study design is a useful reference for my research; 2) considering the single publication as an event to form the egocentric network. It may supports to multiple use cases, e.g. urban computing, conference, news event, etc. This system is suitable to explore the relationship of a given dataset, for a temporal and egocentric related tasks; 3) the interaction of slider on time and weighting items is useful for a user to explore the content. It may potentially help a user to understand the deep relationship of the given person. This idea may also link to the explain function in the recommender system.

A worth to read citation [4].

Reference

Liu, Qingsong, et al. "EgoNetCloud: Event-based egocentric dynamic network visualization." Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on. IEEE, 2015.
Gansner, Emden R., Yehuda Koren, and Stephen North. "Graph drawing by stress majorization." International Symposium on Graph Drawing. Springer Berlin Heidelberg, 2004.
Shi, Lei, et al. "1.5 d egocentric dynamic network visualization." IEEE transactions on visualization and computer graphics 21.5 (2015): 624-637.
Zheng, Yixian, et al. "Visual Analytics in Urban Computing: An Overview." IEEE Transactions on Big Data 2.3 (2016): 276-296.