From the course: The Data Science of Government and Political Science, with Barton Poulson

Open data

- [Instructor] Have you ever had your car stolen? I have. A few years ago, my car was at the dealer for some service, and someone managed to steal it from inside the shop at the dealership. Now, I wouldn't have thought that a locked garage at a dealer was a risky place for a car, but apparently you never know. But if you do want to know what places are generally the riskiest for your car, then you can use some data. In the United Kingdom, researchers at Co-Op Insurance were able to use open data about car thefts to create a heat map of the places where cars had recently been stolen. This project was possible because the UK has excellent open data. But more places than the UK have open data, and that data includes more than cars getting stolen. In fact, there's an enormous amount of potential in the data that's readily shared by governments and other sources. But first, let's talk a little about what open data is. To be technical, open data is data that is freely available for anyone to use and republish without restrictions from copyright, patents and other forms of control. Now, that definition actually includes two separate concepts that generally go by the name Free, and so people in the software world have different terms for distinguishing them. The first one is Gratis. This means provided without charge. Open data is generally Gratis. Also there's Libre, which means without restrictions, and the idea is that both of these, Gratis and Libre, usually apply to open data. And open data itself is generally part of open government. Open government is government in which citizens have free access to governmental documents and proceedings to allow them to conduct effective public oversight. It gives them the information that they need to assess what's happening, and to respond appropriately. Now, open government can include two kinds of open data. The first is open data about the government. So for instance, the US Freedom of Information Act allows you to get data from the government about things that they're doing. And there's also open data from the government, where portals like data.gov in the US or data.gov.uk provide a very wide range of publicly available data sources. And to state something painfully obvious, I need to distinguish between three concepts that sometimes get confused. Open versus free versus online data. Now, as I mentioned, open data is Libre data, or data without restrictions, so free as in free speech. Free data, or Gratis data, is available without cost, and usually open software and open data are also Gratis, or without cost. In the case of the US Freedom of Information Act, however, which is theoretically Libre data, or unrestricted, there are some small costs associated with requests, so it's not necessarily completely Gratis. And the thing we don't want to get this confused with is online data. Now, some people mistakenly operate under the idea that if something's online, then it is both free to use without restriction, so it's Libre, and that they can do it for free, Gratis, because it's just there and they can get it. Obviously that's not the case. An enormous amount of information on the web is copyrighted, it's restricted, and it's proprietary. And so you have to get permission or you have to pay fees for it. And you don't want to get in trouble with copyright holders because that can complicate your life as you're going through your projects. Now, some of the benefits of open data in government, some of them are pretty obvious. There's accountability, where government is held accountable for their expenditures, for their policies. They have to explain themselves. And the theory is that that can lead to improved efficiency. If they're being held accountable by citizens, then they should at least hopefully do their job better. Open data in government can also support social justice, that's the ability of citizens and groups to do things like assess discrimination or biases at systemic levels and then do something about it. And then finally open data can often provide the foundation for developers to create data products. The UK parking map is one example. Or here in the US, Zillow, which provides information about housing prices, is based on data provided by the government, and it allows people to create resources for other consumers that potentially they sell for money. But the open data is what makes it possible. Now in terms of the data that's available from the government through open data portals, obvious things include spending and evaluation information, political donations are part of it as well as potentially communications between politicians and donors, demographic information about the public, healthcare information, scientific data, there's an enormous amount there, educational data or criminal justice data, and all of these can be accessed through various means to give the fodder for particular data projects. Now, even though it's free and generally affordable, it doesn't come without some challenges. Number one is actually finding the data. There may be portal like data.gov, and you can go there, and the data's there, but it doesn't necessarily mean it's easy to find it or navigate to it, or make sure that it's exactly what you want, and even if you do find it, sometimes it's hard to get the data into an easily readable format. A CSV file with nicely clean data is perfect, but heaven forbid, maybe it's a PDF file scanned, and so it's an image, and so you have to code information by hand. That's a pain. And then also there can be challenges in terms of combining it with other data. That's one of the most important things in data science is the ability to combine these different sources to get new insights. There are also very significant privacy issues related to open data and governmental data in general, and I'll talk about those in a later section in this course. And then finally, one of the paradoxes is that because it's often difficult to find the data, combine it, clean it and analyze it, these difficulties, which require specialized training, software, those can actually increase the inequality that the open data is designed to reduce. Now, I'd like to think that this is a temporary situation, as software advances, as the data portals become more refined, and as more people get the skills to work with data, hopefully that will address some of these issues. But for now, simply having open data is not in and of itself a panacea to these issues. But there are, of course, significant benefits to open data, which is why the governments are going through this. More and more governments and other organizations are opening their data to the public, and it's becoming easier to access, use and interpret that data, and one of the most important benefits of this is that the potential for improvement in accountability, trust and cooperation that can come from the availability and the informed use of open data are coming closer and closer to realization.

Contents