From the course: Data Visualization: A Lesson and Listen Series

Listen: Petra Isenberg

(upbeat music) - And now it's time to talk to Petra Isenberg, a research scientist focusing on visualization at Francis' National Institute for Research in Digital Science and Technology. Thank you so much for joining me today, Petra. It's good to have you here. - Yeah, thanks for having me, Bill. - Yeah, it's my pleasure. So I'm really excited to talk to you about data visualization research for a whole bunch of different reasons. One being that I see your name on research reports all the time; you're very prolific. And the range of topics that I've seen you associated with is also pretty broad. And so I would just want to start at the, sort of, the base of the mountain here at the bottom. Visualization research is an interesting blend between the studies of visual perception, cognitive science, statistical reasoning, human computer interaction, and probably a bunch of others that I'm leaving out. So it's really fascinating to me that while it's sort of a narrow field, because it's focused just on data visualization, it's also incredibly broad because of the number of fields that it overlaps with. And so I just wanted to ask you to start off. Is that what interested you in the field or was it something else? I just love to hear about how you got into it. - Yeah, actually, that's pretty much exactly why I started doing research in visualization. When I was an undergrad in my first year of university, in Germany, that's where I come from, it's very common that if you look like a promising researcher, you get hired by some research groups to do work for them. So I was hired as a research assistant. My first job was, let's say, a bioinformatics group. They had a big database and they wanted to see who was logging in, for how long were they logging in and what kinds of stuff were people doing on this website that they were logging. So I learned to do my first visualizations that way. And after that, I got hired by a multimedia research group and they wanted to understand how their image similarities were calculated and how the algorithms were working out. So I very quickly realized that actually if you're if you're working in visualization, you have access to a lot of different data. You can delve in different topics that you might, you know, all sorts of topics that you might be interested in. If it's biology or medicine or finances or whatever, the field is so open, you can actually do research on other types of things that you're interested in. Yeah, so that was one thing that, you know, the world of data is open to you if you work in that field. The other one was that I quickly realized, too, that it's not only about the data but there are many skills that this research field needs. We need mathematicians who can do a lot of research on fundamental algorithms and what makes renderings efficient and how to deal with big data; we need design-oriented people who invent new encoding techniques, who come up with ways of depicting data nobody has shown before; we need people who study humans to understand how is data analysis even done in order to base any work on how to support data analysis on sort of a fundamental understanding of how people do it. And there are many other, sort of, skill sets that you can apply to the domain, and that's what I find super, super fascinating. - Yeah. You know, it's funny, almost every time I do one of these interviews, I have insights about myself while the conversation is going on (laughs) and I'm realizing. So I actually studied journalism. And one of the reasons I realized years later that I went into journalism or studied it was because I'm a journalist. I just want to sort of touch a whole bunch of different things and constantly be learning and experiencing different topics, et cetera. And data visualization and data visualization research, as you just described it, and as I introduced it, it is sort of the same thing. And I hadn't really thought about it that way, this idea that you can sort of delve into a million topics because of this, this area you focus in on is kind of neat. One thing I wanted to ask you too, is that, you know, data visualization research is fairly new. I know that there are decades of history but compared to some fields that's pretty darn new. And we know of thousands and thousands and thousands of papers. So it seems like there's a nearly supply of topics still left to investigate as you sort of just implied and very few that are what we would call settled science, right? That we know that you should do X when visualizing data. And so, first of all, I would ask you, is that a fair statement? And secondly, what do you think is the most settled area of research that you can point to and say that as a field we're sort of getting to the point where, yeah, we know what we're doing here, maybe in this particular area? - Yeah, I mean, you're absolutely right. If you compare visualization to something like physics, biology, chemistry, it's really young. I mean, our biggest and most prestigious conference started in 1990. And back then, there were lots of people moving in from these various fields that you just asked me about, from physics, from maths, from graphics also. And now, yeah, we have about 30 years of dedicated published research on the topic, which makes a few thousand papers but not in any way the numbers that you have in biology and some of these really old sciences. I would say there have been sort of these waves of research that are happening. And right now we're reaching a phase where some of these things are maturing. So as an example, one of the things we learned a lot about is these ideas of these fundamental encodings or some people call them visual variables or channels that we use to encode data with. This could be, for example, the size of objects, the length of lines, if you think about a bar chart, angles between lines. And so we have this understanding that a lot of graphics are made up of certain specific marks on these channels that encode data with them. Like if you think about a bubble chart, it's going to plot something with sizes on the circles, we know we can encode data with that. And we have a fairly good understanding, there have been some studies that have proven again and again that what channels you choose has an impact on how effective the visualization is. So we know, for example, that judging the lengths in a bar chart, you know, encode just quantities, the lengths in a bar chart, that's going to be, people will be able to more accurately perceive that compared to judging the sizes of circles. So things like that we're starting to have a relatively good understanding about because there have been now a bunch of studies. The other area I would say is most developed is how the visualization field started, that's in scientific visualization. So people started out learning a lot about how to deal with data sets from the sciences, things like medical images, right? If you have some sort of condition under your skin, right? You have, maybe doctors are searching for cancer or you had an aneurysm or something they put you in a scanner and the scanner goes through and looks at, you know, tries to measure what's happening inside your body. And this is a huge data set, two-dimensional data set that we somehow have to depict. And so there's been a lot of research on these kinds of images, on sort of understanding flows in water or flows of wind around vehicles, airplanes, things like that. So here we have a really good tool set already about how to deal with that kind of data. But that's because that's where the research started. - You know what's interesting? I love those two examples because they're at opposite ends of the spectrum. On the one end, you're talking about pretended processing, visual encodings, it's about these very simple things like do we use the length of line or the size of the circle versus these incredibly complex things that you're talking about, medical imaging and the like. And so it really, it sort of runs the gamut, which I think is pretty interesting. If we think about the current state of data visualization research, what do you think are the topics that we should be looking out for? Like the areas that are being studied now that you anticipate might bear either super interesting findings in the near future, and or maybe that next area of what we could say is somewhat settled science a year or five years down the road. - Right. So, yeah, so every couple of years, there's a wave of certain focus topics, I would say, that lots of people get excited about, at least from my perspective looking at research in general. So right now this is, like in many other disciplines, that's machine learning. So in this, we have sort of two streams of research that are related to machine learning. One is using machine learning in visualization and people do things like trying to figure out how you can automatically generate an infographic, or how you could automatically adjust a desktop-size visualization to fit on a mobile screen. Basically, using some sort of training, trained data set to figure something about how we can improve visualization. That's one area. The other area is using visualization to help explain what's happening in machine learning. So how can we help people understand why their model is behaving in a certain way. So that's the viz explainable AI kind of movement where we're trying to see if we can use visualization techniques for helping people with their machine learning problems. So that is one big theme that I see. The other one is, and this has been slowly going on and I think it's going to get much more important in the future, is visualizations for the, let's say, the general public. We have the term in research, we call this personal visualization or personal analytics, right? So in our world right now, people have, with all sorts of backgrounds and experiences, have data collected about them. This could be people with conditions such as diabetes or other things that they regularly need to track. They collect lots of data about themselves that they have to be able to understand to feel good, right? To feel good about themselves. Then there's lots of less serious data such as fitness tracking data, like you want to understand how to improve your fitness, your health, things such as privacy, for example, what kind of data am I actually sharing with the world? You can go all the way to personal finances. There's lots of data that anyone should be able to understand. And visualization is actually, I think, a very, very promising way to teach people about their data. Yet, and actually the, you know this whole pandemic where people need to understand how do I read this chart with new cases? And what does it mean to have certain infection rate and that kind of stuff. Visualization, in some sense, can hopefully help people who are less math inclined or to also understand what is happening. So yeah, this is another big area that I see people moving more into, trying to help everyone, not just expert. So actually I should mention, in research, in the past we focused a lot on tools for work scenarios. So there are some people that we call experts. They are expert in something, right? They're experts in medicine or in archeology, in biology, in finances, in something. And we build tools for them to help them understand. They're are very, very complex in big data. But in contrast to that is like everyone, right? Everyone who is wearing a fitness tracker, who's collecting or who has data that is collected for them, about them. So that I think is something I'm hoping to see a lot more often that is slowly starting to get more important. - Yeah, I think it's a great point. So I guess what I wanted to ask you is, what do you think is the number one thing you might recommend to practitioners so that when they're reading about or hearing about data visualization research that will help them do a better job in their work and sort of, you know, hearing the signal from the noise out of this research? - Yeah, so I think as a practitioner, it depends a little bit what you're looking for. So in visualization, we do have this empirical research and I'll mention that in a second. There's also the other type of research that is still coming back to the roots of the field where people are publishing different ways of representing a certain type of data, or, you know, here's what we've worked with these kinds of experts in, let's say, one of the things I'm currently working on is Bitcoin. So we're trying to understand data about Bitcoin. So one of the research approaches we have is we're going to talk about some people who need to understand this data, right? So you can find information about this is what other people wanted to learn and this is how we are going to represent the data. So you can get some idea about possible tasks, possible questions you might be answering with the data that you have, what other people have wanted to learn. Maybe you get some ideas about things you might also want to learn about that weren't on the top of your list yet and how we represented it. So that is kind of the give-me-inspiration type of research. There's still a lot of that kind of work out there to look for. Now, the other thing is, if you're interested in, so a number of years ago, we had a project where we wanted to understand this question of dual axis, right? So sometimes in Excel, you're able to do this thing where you have, let's say, two line charts. One is about you're trying to show some correlation between, I don't know, the number of things A and the number of things B, but you can't put them on the same axis because they don't have the same, they actually don't have the same measurements that you were taking. One is, I don't know, rain drops and the other one is number of trees somewhere or something, you know, totally different things. Sometimes you can plot them just on top of each other, and Excel lets you do that. Is that a good idea or not, right? - Classic question. - Yes, right? So you can look for papers that will have asked this question, or like you said, uncertainty visualization is a very interesting one. I have this uncertainty, how should I best represent it? Should I do a box plot or should I do a violin plot? Or should I just do some shading? Should I do a confidence interval? What kind of things should I be trying? So here you will find studies, and this is a little bit harder I think for somebody not deeply ingrained in the research world to understand, especially when there is conflicting information out there. The main thing you need to know there is that right now, as you said before, visualization is still relatively young. So we have relatively little confirmatory work where somebody did a study on A and somebody else did exactly the same study, also on A, on problem A, and now one has one result and the other has another result. This happens very rarely. Typically, we have somebody just study A and somebody just studies A-plus or something close on the same topic, but they do it in a different way, they use different data, they use different tasks, different populations. And here if you see the conflicting information, you have to go into, you actually have to read parts of the details of the paper and see, okay, what tasks did they test? You know, in which paper, let's say you have two papers, you have to figure out which one of the two is more closely related to the actual problem you're currently having. So that is a little bit more work. And we need to wait a few more years, I think, before we have these big concrete suggestions on, in this case, A, you do B always. There just isn't enough empirical evidence for everything yet. - Yeah, that's a great point. I think that, you know, listen, the world is complex and you can't expect a silver bullet for every question, and yeah, read the methodology, understand the nuance behind things, and maybe you'll have a better chance of applying it to your needs. (laughs) - Yeah, and it actually it's, especially in visualization, it's tricky because we have so many influencing factors. We have the people we're studying with, right? These could be experts on finance data that are looking at the financial visualization. They will have different background to do certain types of tasks. Plus we have the type of tasks, like sometimes people study very basic thing like find the maximum, compare these two, which one is higher, which one is lower, or sometimes they say, no, look at the data, which stock would you recommend based on what you see, right? There's multitude of tasks you can have. Plus the data that they give you, it can be huge, it can be small, it can have 10 dimensions or just two. There are so many factors that influence visualization research, empirical research, that it's, I mean, I don't know if we'll ever get to a place where we'll have tested everything. - Yeah, well, it's funny, this actually brings us back to the beginning of the conversation, right? So essentially we're studying everything, human computer interaction, visual perception, so many different things. And as you said, you know, endless populations and all of these, it's infinite. And so we have to accept that it's going to take a long, long time to get to where everything's confirmed or not so we can truly understand everything, which is, of course, really an impossible task. We're actually running out of time. You know, these interviews, we keep to a fairly specific timeframe, but I want to very much want to thank you for talking to me today. It's really interesting conversation. I think our audience is going to get a lot out of it because, you know, again, particularly on the practitioner side, for a more general audience, they don't know this research is out there in some cases and, or they know it's there but they don't know exactly what it means or how to how to work with it. And I think that what you've said today is going to be very helpful for them. You did mention early on, you made a sort of an aside, a tangential comment about the idea of, well, gee, how do we convert a visualization from desktop to a mobile, which is, you know, I'm glad you said that because it reminded me I think you said that you have a book coming out on mobile visualization and maybe you can just tell us about it briefly before we say our final goodbyes. - Yeah, sure. So one of the really cool things in computer science is that, you know, somewhere in Germany, in the border to France, there's this castle where you can go to have research seminars funded by the German research organization, this place called (indistinct). And so we had a seminar there with about 30 to 40 people on this topic of mobile visualization, where we discussed in-depth how we want to move this research field forward. And there were practitioners there, there were researchers, and, as we mentioned, a couple of times, in these different disciplines, visualization, HCI, and yeah, we decided one of the bigger things we can do is just promote this area further, you know, discuss through the publication of the book what are all the important aspects to consider when you build visualizations for mobile devices. Because that's going to be one of the form factors where people access data, right? So and how there's very little research on that right now, surprisingly, that even though lots of so many people are using mobile phones everyday. So yeah, that book will come out in 2021, in March, hopefully. And yeah, we'll have lots of information about how do you encode data for mobiles, how do you interact with data on mobiles, how do you evaluate it and yeah, what other things you could do. - Sounds like a very important and very helpful book. (laughs) - Thank you. So let me mention one more thing, though. If people are interested this year, because, yeah, it's the pandemic, our biggest conference, (indistinct) the conference will also be completely free. So if practitioners are interested to look at it, there's going to be a practitioners track dedicated specifically to people also in industry and yeah, they can participate for free and just see if there's useful information there. I think it starts the 24th or 26th of October. And so the Friday of that week, Friday of that week, yeah. - Great. Well, this is perfect. Thank you so much again. I really appreciate you talking to me today. - Yeah, thanks for finally meeting you. It was really nice chatting with you.

Contents