From the course: Designing Highly Scalable and Highly Available SQL Databases

Human-scale and machine-scale data

From the course: Designing Highly Scalable and Highly Available SQL Databases

Start my 1-month free trial

Human-scale and machine-scale data

- [Instructor] Now, as we design databases, oftentimes we start to think about, What's the data model going to look like? And how will I represent the entities in my domain? And that's a perfectly reasonable place to start. In fact, that's a really good place to start in many cases. Now, if you're dealing with a highly scalable database in addition to that, you also want to start thinking about data ingestion and basically data ingestion, is the process by which we acquire data. We get data into our application or eventually into our database. So there are a couple of things to keep in mind at very crude levels. When we talk about data ingestion, we can talk about human scale ingestion. And by that, I mean the kind of volume of data you would get if people say we're working with a keyboard or maybe a voice to text application, but some kind of input device where human behavior and actions control the speed at which data is generated. And that is one level of data ingestion. And for many years, that was really the only type of ingestion we had to worry about when we were working with relational databases. If however, in the past you had what's known as machine scale ingestion, that's when we're dealing with very large volumes or potentially very large volumes of data. And with machine generated data, we're talking about things like internet of things or application and infrastructure performance monitoring, or telemetry data from vehicles in a fleet that you're constantly monitoring. Now, typically machine ingestion level data volumes are often written to NoSQL databases especially wide column databases like Cassandra and Google Cloud big table. But there may be occasions where you find that you have at least moderate size machine scaled data ingestion that you want to capture in a relational database. And so, of course, you want to think about how you do the ingestion and then you also want to think about how you write the data in terms of what does the data model look like and how can you have very low latency rights when you're dealing with machine scaled data. And so that's what we're going to cover in this chapter. We're going to talk about ingestion and also some design patterns that can help at sort of the backend, the database backend with regards to having low latency rights so that you can keep up with the machine scale data. Now, when we talk about human scale ingestion and really also machine scale ingestion, one of the things we want to understand is, what's the growth rate? How do I estimate what my data volumes are going to be six months, a year or two years from now? Well, when we're dealing with human scale ingestion we can think about data in terms of the number of people using the system, the more people using the system, the more data we have coming in potentially there are other factors, of course, that come in as we have other data sources. But if you think about things an insurance claim, we may be picking up data, say online people are working with a mobile app, submitting information about a claim, maybe uploading a picture of a dent in their car or something like that. That is still human scale ingestion. Now, there's more people maybe using the system that are just say your employees which would be a typical scenario, say, if you were doing a database 20 years ago. But in a sense that's still human scale ingestion because we have people generating the data. And so we can think in terms of how many users we have and that'll give us some insight into the volume of data we'll be dealing with. Now, when we're talking about machine scale ingestion the volume that we're dealing with is a function of the number of devices that we're working with. So for example, if we have sensors on a fleet of vehicles, then obviously the number of vehicles, we have the number of sensors on that vehicle as well as the volume of data that each sensor generates. Those are the things we need to understand to be able to estimate growth rate and scales of ingestion. Now, some human scale ingestion type applications typical ones are like web applications, interactive apps on mobile devices. Now this is a little tricky, especially with mobile devices because if you have some explosively popular app, you have millions of users, then you could easily start looking at volumes that would be, for example, someone else's machine scale level of data. But if you have say moderately successful apps or, you know, very few users that still falls in the realm within human scale ingestion. And also back office enterprise applications. So things like customer relationship management, finance, maybe inventory management those are typically human scale ingestion kinds of applications and databases. Now, when we're talking about machine scale, I already mentioned internet of things, but also like background data collected on by apps on our mobile devices. So, for example, in addition to maybe interacting with a human user, your app may collect data say, every 10 seconds about geolocation because your app offers coupons based on, you know, retail stores in the vicinity of wherever the user is. So mobile devices can fall into both human scale and machine scale depending on how it's used. And also application and infrastructure monitoring. You know, as systems become more complex, it's more and more important to have insight into how services are performing, whether or not servers or containers are crashing and information like that. So we often instrument our applications and that can generate a lot of data about performance. So those are some examples of machine scale ingestion. And there's also of course, credit card transaction processing. And I want to mention this because again, this is kind of like a mobile app. It may seem like a single instance is not generating all that much data, but when you have thousands or millions of people doing credit card transactions this is another case where, you know, the lines between human scale and machine scale are fuzzy and credit card transaction processing could easily be in the realm of machine scale ingestion as well.

Contents