From the course: Designing Highly Scalable and Highly Available SQL Databases

Business requirements for database scalability

From the course: Designing Highly Scalable and Highly Available SQL Databases

Start my 1-month free trial

Business requirements for database scalability

- [Instructor] Databases have been around for quite a while. Those are nothing new. But scalable databases are becoming more important. And somebody once asked me how do you know when you have a good database architecture? Well, really, there is no one single ideal architecture. The best architecture for a particular application depends on the business requirements, or the organizational requirements. But in general, sort of the parameters that we look at as we try to determine what's best for our particular application is we need to understand how much data needs to be stored, we need to understand the lifecycle of how long we're going to keep this data and who needs to access it. And accessibility is really important. That ties into responsiveness, so oftentimes we are concerned with low latency, both when we write the data and when we read it. And of course, our databases need to be highly available and reliable. So the goal that we're focusing on is trying to identify how we provide storage services that meets the particular set of requirements we have before us. So here are some key questions that we should be asking. First of all, what data are we working with? How is it structured? Is it highly structured and tabular or is it semi-structured, like documents with varying attributes? Or is it pretty freeform like natural language text? We also want to understand how entities relate. We'll talk a little bit more about that when we talk about domains. Now when we talk about how much data, there there are a number of different things we're looking at. Of course, there's the current volume, what are you storing already. But then we also have to consider how much is coming in, say every day or every hour. So that's our rate of ingestion. But then there may also be growth. So for example, if you're creating a database and you're pulling in data from IoT sensors, well, there you have a particular volume. Now you have a particular rate of ingestion but you may also have planned growth. You may be rolling out more sensors, so you want to take that into account. And of course, we also need to account for how long we're going to store the data and what ways are we going to store the data throughout its lifecycle. And then we also want to understand how it's used. So for example, are we doing things like sales transactions? Or are we monitoring manufacturing devices? Or are we saving the data for compliance reasons? In which case, we may not really be all that interested in the data in terms of having to query it or need rapid access to it but we need to keep it around in case it is required. And another area that is increasingly important is using data for data-driven decision making. So those are the kinds of things we need to understand with regards to how our data is used. So when we're talking about what data we're dealing with, one of the first things we want to do is kind of get rough boundaries around the domains that we talk about and the domains are like a subject area and it's defined by a certain set of coherent business processes and related pieces of data. So for example, we might have a domain around sales and we might have a domain around inventory. And we might have a domain around logistics. And there can be overlap between domains and these aren't hard and fast rules but when we do identify these sort of coarse-grained domains, we also also want to be able to identify the entities in them. So those are the things that we're dealing with. Now, once we identify them, we want to model how they relate. So for example, how does a particular sales transaction relate to a customer? Or how does an order relate to multiple order items? And we also need to identify the attributes of those entities. So for example, what information do we have to keep on a customer? Obvious things are things like name and address, perhaps phone number and email. And also, we need to understand how data changes over time. So we may be modeling things differently a year from now than we are today. So we want to try to plan for that kind of change. And these things that we're describing, when we're talking about the what data, it's really typically what's called data modeling. Now let's turn to how much data. First thing we want to know is, of course, is how much exists today? 'cause we've got to be able to cover that. And how much will we need in the future? Now, the way to answer that question, what will we need in the future is in part answered by looking at what domain-driven events, things like a sale or a new shipment that's received at a warehouse and inventory is loaded up or data comes in from IoT sensors every 30 seconds. These are particular domain events and if we understand the frequency with which they occur, how much data is involved with each one, then we can start getting a sense of what our data generation is going to look like. And again, we also need to look at the expected rate of growth. So what are the other things going on in the business that are going to increase the size of the data that we need to deal with? Finally, the last big question is how is the data used? And there are many different ways. I'm sure we won't over a fraction of them here but just some examples here are interactive transactions. So for example, I might make a purchase online and purchase a book or purchase a course. That's an interactive transaction. There are also ways to use data by aggregating the data and this is especially important for business intelligence applications, data science and machine learning. And also, with streaming data, or high-volume data, we may also be interested in looking at anomalous events and time series. So again, there's a wide variety in the ways that data can be used.

Contents