HM Thinking

Thursday, December 29, 2016

The structure of the information visualization design space

Note:

This paper provided a framework to organize and structure the visualization plots. It considers the following features:

Data Type: Nominal, Ordinal, Quantitative, Intrinsically Spatial, Geographical, Set mapped to itself
Function for recording data: filter,sorting,multidimensional scaling,interactive input ot a function
Recorded Data Type: same as Data Type
Control Processing : tx (text)
Mark Type: point,line,surface,area,size
Retinal properties: color, size, connection, enclosure
Position in space time: position in space time, N (Nominal) O (Ordered) Q (Quantitative)
View transformation: ::=nb (hyperbolic mapping)
Widget: slider, radio buttons

For example: Multi-Dimensional Tables

Points: 1) many of the visualization is not web-based. Is there any particular reason to use web standard? 2) if the web-based visualization, what is the framework different? e.g. the web-based application may using more mouse gesture to click, scale and hover. Or, with help of useful libraries like D3.js, how does it influences the implementation of data visualization? 3) the design space for non-web-based applications are more open and less limitation, but accessibility is weak to share and collaborative.

Worth to read more: [2], [3], [4] for the web-based space of data visualization.

Reference:

Card, Stuart K., and Jock Mackinlay. "The structure of the information visualization design space." Information Visualization, 1997. Proceedings., IEEE Symposium on. IEEE, 1997.
Figueiras, Ana. "A Typology for Data Visualization on the Web." IV 13 (2013): 351-358.
Turetken, Ozgur, and Ramesh Sharda. "Visualization of web spaces: state of the art and future directions." ACM SIGMIS Database 38.3 (2007): 51-81.
Brath, Richard, and Ebad Banissi. "Using Typography to Expand the Design Space of Data Visualization." She Ji: The Journal of Design, Economics, and Innovation 2.1 (2016): 59-87.

A Tour through the Visualization Zoo

Note

This paper introduced the basic figure plots for data visualization. The mentioned schemes included:

Time Series Data: Index Chart

Time Series Data: Stacked Graph

Time Series Data: Small Multiples

Statistical Distribution: Horizon Graph

Statistical Distribution: Stem-and-Leaf Plot

Statistical Distribution: Q-Q Plots

Statistical Distribution: Scatter Plot

Statistical Distribution: Parallel Coordinates

Maps: Flow Map

Maps: Choropleth Map

Hierarchies: Node-Link

Adjacency Diagrams: Lcicle Tree Layout

Adjacency Diagrams:Enclosure Diagrams

Network: Treemap

Network: Nested Circles

Network: Force-directed Layout

Arc Diagram

Matrix View

Reference

Jeffrey, Heer, Bostock Michael, and Ogievetsky VADIM. "A Tour through the Visualization Zoo." Communications of the ACM 53.6 (2010): 56-67.

High-dimensional data visualization

Note:

This paper introduced the basic figure plots to display the multi-dimensional data. The mentioned schemes included:

Mosaic Plots

This plot is good for categorical data display, for the user to compare the different between features. But it requires the user to pay attention to multiple directions (top/bottom, left/right), which makes it harder to follow, less user perception. Besides, this plot provides a quick overview categorically, but for ordinal and interval variables.

Trellis Displays

Nice to provide a comparison between variables, not suitable for temporal data and categorical data. Besides, many of the cells may repeating or empty.

Parallel Coordinate Plots

Nice to show the temporal data, requires the skill to solve the overplotting, scaling and sorting problems.

Projection Pursuit and the Grand Tour

Not easy for the human brand to process a 3D plot, but it shows the dynamic between the dimension projection. For instance, using a scatterplot with 3 dimensions, let the user explore the pattern across dimensions, is one type of grand tour.

Summary

A summary with the functionality of exploration and presentation included the interactivity of each plot. However, I think the Trellis may also provide interactively, e.g. this demo.

Reference

Theus, Martin. "High-dimensional data visualization." Handbook of data visualization. Springer Berlin Heidelberg, 2008. 151-178.

Wednesday, December 28, 2016

Collaborative visual analysis with RCloud

Note

This paper discussed a collaborative visual analysis environment for a team work. For a data science related project work, it is very common to design, analyze and deliver the result to target audience, could be a colleague, customer or your boss. This is a process of exploratory data analysis (EDA). This paper argues the works are usually done by different tools, i.e. coding in scripting language and design the interface with web techniques. This makes the collaborative work very difficult, due to lack of discoverability (code reuse), technology transfer (collaborate) and coexistence (plus interactive visualization tool). Hence, this paper proposed a framework - RCloud, which using R to integrate the back end analyze and front display in a restful API structure. The basic idea is every application natively demonstrates the result to users through web browsers. This framework is re-using and coupling the existing package in R.

Points: in a small scale teamwork size and low dynamic of project requirements, I think this framework would work well. However, if more and more projects (usually small and not mature result) go live, the search and re-use may create extra workload for the developer. In another hand, the R package may not be suitable to solve all the practical problems, e.g. a large scale data storage or distributed computing tasks. Besides, there are more framework options to better facilitate the collaborative between developer and designer, e.g. the MVC framework. I think a good framework should stand alone with the specific language and techniques, so it can generally support to dynamic real world requirement.

I actually like this idea, it shows the values to deliver the beta works to the users. It 'd be good if we can put the research finding or preliminary result on the web for a better potential collaborative, public exposure, and self-advertisement. The other trend is using Scala to bundle the analysis, implementation, and production.

Reference

North, Stephen, et al. "Collaborative visual analysis with RCloud." Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on. IEEE, 2015.

EgoNetCloud: Event-based egocentric dynamic network visualization

Note

A quality work on network visualization, this paper proposed a visual analytic tool to display the structure and temporal dynamics of an egocentric dynamic network [1,3]. It considered three important design factors in this work: 1) network simplification: to show all the links in the network graph is meaningless and over the information loading for users. A reasonable way to "prune" the node to highlight the important nodes is necessary. It firstly defined the weighting function by co-author number and ordering. Based on the weighting function, the authors tried four different approaches to pruning the node, to maximize the efficiency function, which maxes the weighting in the sub-graph.
2) temporal network: the temporal information present by horizon graph with an axis of time. It would be a simple task to identify the distribution over time; 3) graph layout: the layout designs with a 2D space. Due to the temporal relationship, the chart divides into several sub-graph that hard to fit by regular force-directed graph layout. They extend the stress model to calculate the ideal design [2].

Points: 1) the research methodology of visual analytic: from design, implantation, case study to user study. The user study design is a useful reference for my research; 2) considering the single publication as an event to form the egocentric network. It may supports to multiple use cases, e.g. urban computing, conference, news event, etc. This system is suitable to explore the relationship of a given dataset, for a temporal and egocentric related tasks; 3) the interaction of slider on time and weighting items is useful for a user to explore the content. It may potentially help a user to understand the deep relationship of the given person. This idea may also link to the explain function in the recommender system.

A worth to read citation [4].

Reference

Liu, Qingsong, et al. "EgoNetCloud: Event-based egocentric dynamic network visualization." Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on. IEEE, 2015.
Gansner, Emden R., Yehuda Koren, and Stephen North. "Graph drawing by stress majorization." International Symposium on Graph Drawing. Springer Berlin Heidelberg, 2004.
Shi, Lei, et al. "1.5 d egocentric dynamic network visualization." IEEE transactions on visualization and computer graphics 21.5 (2015): 624-637.
Zheng, Yixian, et al. "Visual Analytics in Urban Computing: An Overview." IEEE Transactions on Big Data 2.3 (2016): 276-296.

Tuesday, December 27, 2016

Following scholars

Network Science

Jure Leskovec (Homepage, Google Scholar)
Jon Kleinberg (Homepage, Google Scholar)

Recommendation System

Izak Benbasat (Homepage, Google Scholar)
Bo Xiao (Homepage, Google Scholar)
Dokyun Lee (Homepage, Google Scholar)

Visualization

Kwan-Liu Ma (Homepage, Google Scholar)

Thursday, September 8, 2016

A review of NSF funding on recommender system explanation.

Summary

I found out the explanation of recommender system is a potential research subject in a few different areas. So I did a review for the relevant project within all the NSF funding award list. Here are some of my thoughts for each project.

CHS: Small: How recommendation and explanation affect preferences in social networks (2014-2017)

This project focuses on the exploration of the influence factors of recommendation and explanation on the user interaction in social media. Hence, the experiment is basically followed the existing functions in the social media services, e.g. Facebook and Twitter. This project implied the recommendation and explanation provided by social media did change the human behavior of online activity. There are always many criticisms of the social media "manipulation" the public opinion by its ranking algorithm. This concern makes this project is meaningful, due to the understanding of how the information affect user's preferences is still little known. According to the structure in my previous post, this is the first layer of the structure.

CHS: Small: Context-aware mobile systems to facilitate synergistic face-to-face interactions (2014-2017)

This project related to what we are doing now of the people recommendation on CN3. I think it did require a mobile version of application to better answer the human face-to-face interaction. This would be the current trend to conduct the study. The project plans to conduct the experiment for a group of freshmen students. I think this would be a more stable user study setting for a long-term basis interaction pattern collecting.

CHS: Medium: Towards Transparency of Personalization on the Web (2014-2018)

This project is very relevant to what I plan to do. They focus on the issues on sensitive data of user in the personalized system. First, they plan to distinguish the challenge for mobile user with sensitive contents. Second, develop a system with different personzied techniques to measure the prevalence. Third, identify the personalized political content. Fourth, personalized financial and health information applications. I think the PI and co-PI have a strong connection with commercial companies to gather necessary data set and user study environment. With a real world data set and system, it would be more make sense to claim the finding of privacy challenge and patterns. I wonder if there is a need for an interest-conflict-free study from my side? Say, a standalone small scale experiment to recall or further explain some of the issues can not be answered with the real world system or dataset.

III: Small: Technologies for Creating Explanatory and Exploratory Animations from Scientific Data (2015-2018)

This is a project for scientific data visualization, using an animation format. Our goal is not very relevant to this one, but the idea about how them formation the data into animation and make user easy to understand or use. This could bring us some insight about how to convert the recommender system result into a user-easy-understandable format.

III: Small: Collaborative Research: Towards Interpretable Machine Learning (2015-2018)

This project focuses on the decision support from machine learning, to answer how the interpretable of machine learning techniques can help the user to make a better decision. They focus on a well-known classifier - KNN as an example, to examine the interpretable in three metrics: simplicity, verifiability and accountability. The experiment is focused on how to make the classifier is intuitive to users, predictable and controllable. The similar idea can be also suitable for recommender system. But the problem is how to make the idea is novel and not just repeating the same idea from this project?

III: Medium: Machine Learning with Humans in the Loop (2015-2019)

A new direction of integrating human behavior to machine learning algorithm. This project is aimed to better facilitate the human behavior with the design of machine learning algorithm. There are some latest publications start to answer the issues and challenge in this area of "Humans in the Loop". This project is also focusing on the decision process and support from human. Furthermore, to design a better interface to connect the learning algorithm and human behavior. Ultimately, as a human interactive learning system. I am wondering the connection between human in the loop and the human in the "user modeling". In recommender study, we model the user based on their preferences and historical data, there is few studies discuss about how to let user to join and understand the process.

CAREER: Analyzing Interactions in Visual Analytics for User and Data Modeling (2015-2020)

I think this project answers my concern in "III: Medium: Machine Learning with Humans in the Loop". This project intends to develop an interface to extract users' high level knowledge, to a better user or data modeling. The visual interface can help to better analyze the human-machine interaction. As a long term grant, this implied some of the potential in this direction of research. But I think the visual analytic approach is only one of the way to engage user into the system. In some of the cases, for example, the text mining, the visual tool may not that useful. There may be some further potential work I can pursue in recommender system study.

CHS: Medium: Collaborative Research: Beyond the Black Box: Understanding and Designing for User Expectations of Algorithmic Media (2016-2019)

This project discussed more about the public awareness of the widely used algorithms. This is the latest funding since July 2016, which means, the state of the arts of the current researches. According to my literature review, this is right on the spot of the most potential research topic in this area. What I want to do on recommender system is pretty similar to this one. The main difference is that, I focus on more about the recommender system to item or people, but this project emphasizes more on the social media. However, I agree, based on the social media would be more suitable or simply to respond some of the interesting issues or channledge across disciplines. For instance, the law and ethics of the post ranking on Facebook or Twitter to affect the user political leanings. Even so, I think the recommender system can answer this question in a more genelize perspective, say, how the transparency and help answer the media bias. Also, the more potential issues in different area, e.g. e-commerce or location-based people system.

(*Rank by start-end funding year.)