From the course: SharePoint for Enterprise: Data Management

Understanding your data - SharePoint Tutorial

From the course: SharePoint for Enterprise: Data Management

Start my 1-month free trial

Understanding your data

- [Instructor] By now, I'm sure you're all really excited by the idea of using metadata. And you can see how it's going to be super helpful to you to organize and retrieve information. But some of you might be scratching your heads a little bit. A question that I'm often asked at this point is "How do I decide what metadata I need to capture?" So I think a good place to start is by trying to construct a schema of the data that's available. Let's take a look at the sales spreadsheet again for a moment. Now this is actually pretty straightforward. We can see the types of data that are contained here, customer, region, salesmen, et cetera. And we can see values for those. Now, some of the values like the salesman's name or the region, all those are going to be used multiple times while things like invoice number or date of sale will change with every record. So armed with that information, let's actually get it down into a visual format. I'm going to start with the data fields that have the fewest values and try and record them all. So I think that region is probably our best place to start. It's a small number of values, they're easily definable, and most likely it doesn't change very often. So we'll put that information in as the first heading and we'll put the known values under that. Now customer could be the next one. Although we have to hope that the salespeople will continue to find new prospects. We know who they are today, right? So let's list them out and then let's add salespeople the same way. Okay. Let's step back for a moment and look to see if there's a bigger picture here. Is any of the data that we're looking at possibly a subset of something else? Well, I think the answer is yes. All of them customers are only in one region, right? And although in some cases multiple salesmen have sold to the same customer, they're always in the same region, at least at the first pass. So we could group this a bit like so. A lot of the other information on the spreadsheet is order and invoice specific. So it won't group very well, but there are a couple of other things that I want to look at, terms and discount. So if a customer always gets the same terms, then we can group those together. In this case, it's almost true. There's an outlier where one order is at 90 days. Now discounts seem to be a little bit more random, probably tied to order amount or something. These aren't going to fit nicely into our schema. And I think we'll have to consider them order-specific variables unless we can define a rule regarding those exceptions. Anyways, this is the kind of exercise that I would recommend that you go through whenever you're trying to understand your data. Yes, it can be time consuming and yes, you'll probably wind up uncovering some weird stuff and it will lead to some more research and questions. But if you go to the effort, you are going to have a much better understanding of the data that you're working with. Now, the next question, "How is that data being used?" We'll look at that next.

Contents