From the course: Data Visualization: A Lesson and Listen Series

Lesson: Truth in data storytelling and visualization

From the course: Data Visualization: A Lesson and Listen Series

Lesson: Truth in data storytelling and visualization

(upbeat music) (upbeat music) - Data is truth. - Data is truth. Data is fact. Data is fact. Our goal is to make beautiful, impactful, Our goal is to make beautiful, impactful, and insightful visualizations and insightful visualizations out of these fact-based things. out of these fact-based things. So, the truth that is contained in that data So, the truth that is contained in that data must drive every decision we make. must drive every decision we make. We need to take a fiduciary standpoint, We need to take a fiduciary standpoint, a position of objectivity, trust, and care a position of objectivity, trust, and care when presenting data. when presenting data. So, how do you do that? So, how do you do that? What are the most important decisions you need to make What are the most important decisions you need to make as a trusted benefactor communicating with data? as a trusted benefactor communicating with data? First of all, if you're taking fiduciary responsibility First of all, if you're taking fiduciary responsibility for data, that implies that you trust the data itself, for data, that implies that you trust the data itself, which means you must be confident which means you must be confident in the source of your data. in the source of your data. There are a lot of potential issues with data sources, There are a lot of potential issues with data sources, are they biased and driven by hidden are they biased and driven by hidden or not-so-hidden agenda? or not-so-hidden agenda? If there is bias, is it methodological If there is bias, is it methodological such as badly written survey questions such as badly written survey questions or just in the analysis? or just in the analysis? If the raw data is okay but the analysis is suspect, If the raw data is okay but the analysis is suspect, you might be able to do something about that, right? you might be able to do something about that, right? Ask yourself, is your source qualified to be collecting Ask yourself, is your source qualified to be collecting and or analyzing the data they've provided to you? and or analyzing the data they've provided to you? Are they doing valid statistical analysis? Are they doing valid statistical analysis? Are they willing to share their sources of methodologies Are they willing to share their sources of methodologies and allow you to do so in your work? and allow you to do so in your work? If you trust your source, you then have to acknowledge If you trust your source, you then have to acknowledge that anyone seeing your visualization for them, that anyone seeing your visualization for them, you're now their source. you're now their source. So, have a skeptical eye, even on yourself So, have a skeptical eye, even on yourself and share your sources of methodology with your audience. and share your sources of methodology with your audience. Any work that isn't accompanied by transparency Any work that isn't accompanied by transparency is suspect by default. is suspect by default. Related to sourcing is the concept of sample size Related to sourcing is the concept of sample size and composition. and composition. Let's say I did a survey and I asked the question Let's say I did a survey and I asked the question do you think ice hockey is the best sport do you think ice hockey is the best sport in the history of the universe? in the history of the universe? And I reported that a 100% of the respondents said yes. And I reported that a 100% of the respondents said yes. Would you trust those results? Would you trust those results? If you pressed me and I said well, hey, If you pressed me and I said well, hey, I asked 10 people at a hockey game, I asked 10 people at a hockey game, this guy being my first survey respondent, this guy being my first survey respondent, would that skew your opinion of my findings? would that skew your opinion of my findings? Of course. Of course. Why? Why? Well, I only asked 10 people, that's too small a group, Well, I only asked 10 people, that's too small a group, not to mention that these people are all likely not to mention that these people are all likely to be hockey fans since they were at a hockey game. to be hockey fans since they were at a hockey game. My sample is junk and problems like this can lead My sample is junk and problems like this can lead to junk science. to junk science. Another big idea is corelation versus causation. Another big idea is corelation versus causation. You probably know this one already. You probably know this one already. Say, in my same hockey survey, I asked a second question Say, in my same hockey survey, I asked a second question which was, how many bowls of soup have you eaten which was, how many bowls of soup have you eaten in the past 12 weeks? in the past 12 weeks? And every one answered like 15 or 16. And every one answered like 15 or 16. And say I repeated this same survey question And say I repeated this same survey question and I asked another 10 people during a Boston Red Sox game and I asked another 10 people during a Boston Red Sox game in August and I got a lot fewer positive responses in August and I got a lot fewer positive responses about the love of hockey and the soup answer about the love of hockey and the soup answer was consistently in the one to two bowls range was consistently in the one to two bowls range during that same time period. during that same time period. What if I now declared that loving soup What if I now declared that loving soup is a clear indicator of an affinity for ice hockey is a clear indicator of an affinity for ice hockey or worse, what if I said eating soup or worse, what if I said eating soup will make any nonfan into a hockey fanatic. will make any nonfan into a hockey fanatic. Obviously I'm saying there's causation Obviously I'm saying there's causation where there's no proof of causation. where there's no proof of causation. I haven't measured if increased soup eating I haven't measured if increased soup eating leads to hockey loving and, on top of that, leads to hockey loving and, on top of that, I've conflated things without acknowledging I've conflated things without acknowledging that there are a million other variables at play, that there are a million other variables at play, the most obvious one being that maybe people eat more soup the most obvious one being that maybe people eat more soup during hockey season, winter time, during hockey season, winter time, than during baseball season in the summer. than during baseball season in the summer. This is an obvious example but as a data visualization This is an obvious example but as a data visualization practitioner, you need to think very critically practitioner, you need to think very critically about issues like this to be confident about issues like this to be confident that the presentation you're making that the presentation you're making respects what the findings actually are respects what the findings actually are in the data you're presenting and doesn't say something in the data you're presenting and doesn't say something you don't intend such as causation. you don't intend such as causation. Finally, one of the easiest ways to earn Finally, one of the easiest ways to earn and break your audiences trust is in your use and break your audiences trust is in your use of scale and your visualizations. of scale and your visualizations. The easiest way to talk about scale The easiest way to talk about scale is to use some examples. is to use some examples. In this example, we're looking at different company In this example, we're looking at different company divisions sales figures for this month. divisions sales figures for this month. As you can see, Group A is killing it As you can see, Group A is killing it while probably group E should just be shut down, right? while probably group E should just be shut down, right? They're terrible. They're terrible. But, of course, I'm sure you already notice But, of course, I'm sure you already notice that's not a fair conclusion. that's not a fair conclusion. Look at the Y-axis scale. Look at the Y-axis scale. Each division has sold nearly a billion dollars Each division has sold nearly a billion dollars this month, all of the divisions are within one 10th this month, all of the divisions are within one 10th of 1% of group A. of 1% of group A. No disasters anywhere. No disasters anywhere. This brings up the one never ever, ever, ever, ever, This brings up the one never ever, ever, ever, ever, ever break rule in all of data visualization ever break rule in all of data visualization and that's you should never ever ever and that's you should never ever ever and 15 more never ever's have a column chart and 15 more never ever's have a column chart where the scale starts at anything other than 0. where the scale starts at anything other than 0. That's because humans will always pre-attentively, That's because humans will always pre-attentively, meaning before they're even aware of it, meaning before they're even aware of it, compare the areas of those rectangles compare the areas of those rectangles and will therefore pre-attentively conclude and will therefore pre-attentively conclude that group E is awful and everyone should be fired. that group E is awful and everyone should be fired. While better labeling and context setting can help While better labeling and context setting can help make the truth more clear, make the truth more clear, just never do this with a column chart. just never do this with a column chart. Another example is this data. Another example is this data. Here we're seeing the resting heart rate Here we're seeing the resting heart rate of an Olympic athlete. of an Olympic athlete. Clearly they are healthy and strong Clearly they are healthy and strong and consistent, amazing. and consistent, amazing. And here we have another person's resting heart rate. And here we have another person's resting heart rate. Yikes! Yikes! It looks like she's having a heart attack right now. It looks like she's having a heart attack right now. And again, you probably already figured out And again, you probably already figured out but we've just zoomed in on the same data. but we've just zoomed in on the same data. By zooming in, you can greatly exaggerate a chart By zooming in, you can greatly exaggerate a chart to make mild changes or differences seem extreme, to make mild changes or differences seem extreme, whereby zooming out, whereby zooming out, you can make extreme differences seem mild. you can make extreme differences seem mild. Either way, it's potentially a lie. Either way, it's potentially a lie. You need to understand what's a fair representation You need to understand what's a fair representation of the data and set your scales accordingly. of the data and set your scales accordingly. A good general rule for most charts A good general rule for most charts is to set the bottom and the top of the scale is to set the bottom and the top of the scale at or just outside the minimum and maximum values. at or just outside the minimum and maximum values. Pad the numbers a bit so there's room to see Pad the numbers a bit so there's room to see what's happening and make sure you use rounded numbers what's happening and make sure you use rounded numbers since they're easier to remember. since they're easier to remember. Unless there's a compelling reason to treat the scale Unless there's a compelling reason to treat the scale differently, this is usually fair. differently, this is usually fair. For instance, using our sales example from before, For instance, using our sales example from before, if the sales target was 1.8 billion dollars for the month, if the sales target was 1.8 billion dollars for the month, then you could legitimately set the top end then you could legitimately set the top end of the scale higher, maybe even drop in a dotted line of the scale higher, maybe even drop in a dotted line and a label to show that target. and a label to show that target. That's not manipulative or lying, that's fair. That's not manipulative or lying, that's fair. A scale can have an outsized influence A scale can have an outsized influence on what your audience thinks about your data on what your audience thinks about your data and what they remember about it. and what they remember about it. So, these are just a few important things So, these are just a few important things to think about when you're creating data stories. to think about when you're creating data stories. Just remember to think of yourself as a data fiduciary. Just remember to think of yourself as a data fiduciary. Think first and foremost about being honest with your data Think first and foremost about being honest with your data and have a skeptical eye on your own work. and have a skeptical eye on your own work. Try to imagine what a naysayer would say about it Try to imagine what a naysayer would say about it and be sure the decisions you make will all help and be sure the decisions you make will all help you make your case without damaging your credibility. you make your case without damaging your credibility. Be responsible and own your truth. Be responsible and own your truth. Next up we'll be talking to Alberto Cairo Next up we'll be talking to Alberto Cairo who has a new book coming out on exactly this topic, who has a new book coming out on exactly this topic, so it'll be great to hear his perspective so it'll be great to hear his perspective on this important subject. on this important subject.

Contents