
If you have been working in visualization for a while, you must have met someone asking you, “How do you visualize this?”
I have been on the receiving end of this question countless times, both when consulted by some experts who feel they are stuck with their visualizations or by students who struggle to come up with a satisfactory solution when assigned an exercise on visualization design.
Reflecting on this question, I realize that I do not understand why people have such a hard time figuring out how to create visualization solutions for their data problems. Some ideas flow naturally to me, so it’s hard to introspect to understand precisely where the roadblocks are.
Setting aside situations where people are just lazy and do not even try, what makes data visualization so hard for some people sometimes? And why do some people have the ability to come up with brilliant solutions, whereas others struggle?
You may say that the answer is obvious, some people just spent way more time than others learning the craft, and it’s not surprising they can reason about visualization more efficiently and effectively than less skilled individuals.
Yes. True. But what I am saying here is that if we could get a better sense of why people struggle, we could understand data visualization design better and develop better strategies to transfer these skills to others. Furthermore, a better understanding of these roadblocks could also lead us to design software that helps people overcome these limitations.
Over the years, I have developed a few intuitions about the potential sources of the problem. Anecdotally, I have found these three problems to be common:
Problem 1: Focus on data, not tasks. When people ask how to visualize something, they usually ask you how to visualize this “data.” People have this erroneous mental model that it’s possible to establish what is the best way to visualize something exclusively by looking at the data. The problem with this approach is that the same data can be visualized in a million different ways and that the only way to verify if a visualization is successful is to express the goal explicitly. My preferred way to frame this problem is to ask, “What questions do you want to answer with your visualization?” So, when somebody asks me, “How do you visualize this?” I typically reply, “What questions do you want to answer with these data?” It’s only when you pair up data with questions that you can start thinking productively about what visual representations work best and whether they solve the problem you have or not.
Problem 2: Lacking generative skills. When I work with students in class, I notice that they lack the ability to imagine how else a given data set could be visualized. They quickly zero in on a specific solution and have a hard time moving away from it. They are stuck in a “local minimum.” They lack “generative power,” the ability to imagine many alternative representations for the same problem. One way I like to think about becoming a skilled visualization designer is to develop two related but separate skills, the generative skills, and the evaluativeskill. Every visualization design problem boils down to being able to generate alternatives and being able to evaluate which one works best. In my experience, people have more trouble with the first skill than the second one. In fact, when I propose to them a better solution, they typically can recognize the more powerful solution immediately.
Problem 3: Neglecting data transformation. This can also be characterized as a lack of generative skills, but of a different nature. Visualization design is not only about finding the “right” graphical representation but also about shaping the data in a way that best serves communicative or analytical intent. When we visualize data, we do not have only the “graphical lever,” but we also have the “data manipulation lever,” which is very powerful. Deciding how to aggregate, filter, and transform the values we find in the data is as important as deciding what their right graphical depiction is. And the two things go together! You can’t just first decide how to shape the data and then how to visualize it. Often, great intuitions about visualization are about how a simple transformation can make the visualization way more expressive, even if the basic graphical depiction stays the same. A trivial example is deciding the level of granularity to use in a line chart (time granularity) or in a choropleth map (spatial granularity). These are just two basic examples, but there are many more I could come up with.
My favorite solution is the one I mentioned above. When someone asks how to visualize something, the best is to ask for clarifications about what the problem is. What are they trying to achieve, and, most importantly, what questions do they want to answer?
Sometimes it’s useful to distinguish whether the person asking is developing the visualization to generate insights for themselves (i.e., for data analysis) or they are creating a communicative piece to convey an idea or a message to others. Both are very important and not necessarily disjoint because, often, the common source of the problem is a lack of explicit formulations of intents (either communicative or analytical).
In any case, when the goal is to communicate something, a useful step is to ask the person to make it explicit what messages they intend to communicate with their visualization. In principle, this step could also be solved by creating a list of questions the readers should be able to answer, but a more direct way is to write down what the readers should be able to learn after they have observed the visualization.
The trick is to make these elements explicit. I found, for example, that asking my students to literally write down a set of questions and refining them together until they are clear makes designing visual solutions easier. Sometimes the problem is not with the representation but with the lack of an explicit formulation of the problem.
If we want to teach people how to become more confident with their visualization skills, it is necessary to develop, among other skills, the two skills I mentioned above: generation and data transformation (other than evaluation of course).
To develop generative skills, I have developed a few strategies over time. The first one is to expose learners to a lot of different examples. There is really no substitute for just having a very large visual vocabulary. The second one is to disentangle the generative and evaluative skills. Instead of asking a learner to create an “effective” solution, the best is to free them from evaluation and ask them to generate as many solutions as possible. The third one, a mix of the first two, is to ask the learners to develop multiple solutions and then compare the solution they created with solutions generated by other learners and the instructor. Typically, each person generates only a small subset of what is possible, and seeing solutions generated by others for the same problem leads to improving their generative skills.
There is also a complementary, more “mechanical” way to develop generative skills, which is Bertin’s theory of graphical encoding. This is what most of the existing visualization books and courses teach. Visualization is described as a mapping between data objects and visual marks and data attributes and visual channels. It’s a like a toolbox: you have a few types of marks and a few types of channels and building a visualization is a matter of mapping objects to marks and attributes to channels. It is not perfect, but it’s useful when used for generative purposes. If you know what marks and channels are available, you can almost mechanically push yourself to explore many combinations, even if they do not make any sense. In fact, when I teach visualization, I do have an exercise where I ask students to use marks and channels to create “silly visualizations.” I don’t care about how good a visualization is. I want my students to be exposed to as many variations as they can experience.
The data transformation aspect is trickier. Above all, most of the existing books and courses do not really spend enough time elaborating on the role of data transformation in visualization. In fact, this is true even for my courses, despite having a good grasp of the problem. I have plenty of notes I have collected over the years, but I’ve never found the determination to publish something more structured on this topic. In any case, the best way to teach this skill is probably to create examples showing how different types of data transformations can lead to more effective visualizations. More precisely, the examples should show what the effect on the visual representation is when something is changed at the level of the data rather than at the level of the visual representation. Equally important is to have a proper taxonomy or catalog of data transformation operations with their respective visual effects. This, I believe, would substantially improve the skillset of any data visualization learner.
Another interesting direction is to build software that supports designers in exploring design solutions. Visualization research has a long tradition of developing recommender systems for data visualization. Tableau software itself stems from pioneering research by Jock Mackinlay on automated systems for data visualization.
In reference to the problem outlined above, I believe that more opportunities exist in this space. For example, while many recommender systems try to recommend the “best” visualization for a given data set, these systems could support design exploration by proposing a whole set of solutions rather than trying to suggest the best. Also, most systems do not integrate tasks/goals in any meaningful way. Some software could also be built to propose different types of data transformations and to provide guidance in the formulation of meaningful and more explicit sets of data questions. There is way more to explore in this space, and this topic deserves its own post.
The initial impetus for writing this little essay was my curiosity about how to study what people actually mean when they ask you how to visualize a given data set, but I ended up describing some of my experiences teaching visualization and answering that type of question.
Even though I have some intuitions about this problem, it does not mean I understand fully or that my intuitions are correct. For this reason, I find the intellectual exercise of studying this question empirically interesting.
But how could we study it?
One option is to collect many examples of situations where a person tried to visualize a data set and got lost. Maybe there is a way to do this through social media and create a collection of hard data visualization problems.
Another option is to run a study with a group of people in which we assign a data set and a task and ask them to generate solutions. One could then look at the results and get feedback from the participants to discover when and why they feel they don’t have a good solution. I like this solution less than the first one because it’s less “[ecologically valid]” but could be a good and feasible substitute.
My colleague Melanie Tory and her students published a great paper a while back titled “How Information Visualization Novices Construct Visualizations,” in which they do something similar to what I propose. They assigned a series of open-ended data visualization tasks to a group of novices and observed their interactions to understand how and where they have difficulties in producing appropriate visualizations. We need more of this work to characterize difficulties people encounter when developing their visualizations. It’s a great paper that starts identifying these issues more empirically and proposing some solutions. I’ll try to write more about this work in the future.
-
That’s all I have for now. Thanks for reading!
ncG1vNJzZmiemaHEpXrSrpmsrJGYuG%2BvzqZmqWeVrb2tu9GipaBlpJ2ybrDRnpidnqWherLBxKyroqeeYrWwww%3D%3D