From the course: Data Curation Foundations

What is data curation?

From the course: Data Curation Foundations

Start my 1-month free trial

What is data curation?

- [Narrator] I'm so glad you selected this course in data curation, because I really want to spread the word about data curation. Many people have heard the term curation before, but usually it's in reference to museums with old artifacts, like this chair on the slide. That old chair on the slide looks like it is not worth much, so I don't know why a museum, or anyone else, would want it. It looks like it is falling apart, and you could not really use it and be comfortable. But let's say that when the museum does research about the chair and curates it, the researchers learn that the chair used to be owned by the President of the United States. So now, just by having that knowledge, the chair goes from being not very useful to being very valuable. And that's simply because we know its value now. Before curation, it just looked like an old worn out chair. Okay. Now let's look at another example for curation, which is the topic of this course, data curation. Look at this old data on the slide. It does not look like it is worth much, does it? Those are just columns of numbers. Some numbers are big, some are small. And look at those headings. What does DSHOSPID even mean? This data does not look very valuable at all. But remember, like with the chair, looks can be deceptive. Imagine you were new to a workplace and you found these data and asked about them. Let's say someone in the marketing department said that those were data collected from a market research survey, and the survey cost $20,000 to conduct. Well, then throwing away those data would be like throwing away $20,000, wouldn't it? Or maybe these data represent a mailing list of potential business customers that was either very difficult to put together or very expensive to purchase. I think you're getting my point. I also just wanted to add that sometimes people use the term metadata to mean data curation files, because metadata means data about data. But in this course, I'm going to use the term data curation. But anyway, I hope you get my point about data sets and value. That data sets represent money. Just having a data set means that somebody must have spent some money to get that dataset to exist. And, if you are doing some sort of market research, and that's how you got the data, you can assume it was even more expensive because there is a study design behind it. The challenge is that it was expensive to get the data, and unless you keep track of details about the data written down where you can look them up later, then the data will lose its value as time goes on. So data curation is a way to document the meaning of your data so your data can continue to retain their original value as time goes on. It's basically a way to make sure you get your return on investment when you do data intensive projects. Data curation is a very useful skill to have regardless of whether you actually do data analysis yourself or not. If you are bad at data, no worries. You can hit the no excuses key. That's because data curation is done using Microsoft Office Products, especially Word, Excel, and Power Point. Ironically, you rarely have to look at the actual data when doing data curation. But if you do, it is easy to view small data sets in Excel. Larger data sets can be viewed in a text or data viewer, like Notepad, or if you have skills in using software like R, SASS, or SQL you can use those skills to look at the data. Otherwise, you'll need a colleague to help you view the data using those software packages. I keep talking about making data curation files using Microsoft Office software, so you must be wondering what do you actually make? First of all, you make tables. Lots of them. But the tables document different ideas. Throughout this course I'll show you how to make several different types of data curation tables. Here's a fun one. You make a lot of figures. See that figure on the slide? That's just one of the many data curation figures I'll show you how to make in this course. Finally, and lastly, we have text-based curation files. You'll see on the slide a reference to a particular law in California. Later in the course you'll see why having the text of this law would be a relevant curation file to keep if you are analyzing data about hospitals in California. There are also text-based curation files you can make yourself, such as documenting in long form about certain data quality issues. All right. Now that you have a better idea of what data curation is, let's move on to thinking about data curation's management function.

Contents