From the course: Cloud NoSQL for SQL Professionals

Use GCP BigQuery - NoSQL Tutorial

From the course: Cloud NoSQL for SQL Professionals

Start my 1-month free trial

Use GCP BigQuery

- [Instructor] Now this next NoSQL service isn't really NoSQL, so it's a little bit tricky. The underlying storage is NoSQL, but the query layer is actually SQL, so but I'm getting ahead of myself. It's called Google BigQuery. So in order to use this in your Google Cloud account, if you go to the marketplace and datasets, and that's just off the menu over here, and then I'm just going to search for some data, so I'm going to pick storm data. This is public data, and if you click that, it'll say View Dataset. Once you click View Dataset, it'll take that dataset and it'll make it available in the BigQuery service. This service is so advanced, it's really kind of almost like magic, so once you click that, which I already did, you can see here's the... BigQuery public data, and then you can see all the different data here including the new NOAA storm prediction center data. So let's see what it's actually called. NOAA, so down here. NOAA Preliminary Severe Storms. So that's going to be here. So this is all public data that you can access, and this really goes back to sort of the beginning of NoSQL and being started by Google. So this service is the commercialization of a set of services that have been used inside of Google for years and years and years, and this service is actually not new. It's been out since 2011. And when it was released, it was so revolutionary people didn't actually understand it, they didn't get it, they just didn't believe it, basically. So what is this? Well, the underlying storage is a column store, but you don't see it or care about it because it's serverless. And really the brilliance of the service is that the column store is designed for aggregations, so they're sort of the classic data warehousing of events. However, the top query layer, unlike everything else I'm going to show in this course, is actually SQL, so this is NoSQL SQL. (laughs) Which is a little bit crazy, but I think it's the most advanced, most usable, and it's something I use with almost every customer. Why? Because people want to use SQL, they know SQL, and they want it to be scalable, and they want it to be partitionable, and they don't want to deal with managing servers, and that's really what you get with this. So you can see inside of here I created a query, so I looked at the data itself, which has these hail reports, tornado reports, and wind reports, and I just selected some of the data, and before I clicked Run, it told me how much data would be processed. Why? And again, I think this is the front end of the NoSQL movement. Actually it sort of comes back around to SQL. The idea is you pay only for the amount of data scanned. You don't pay for servers, you don't pay for licenses, you don't pay for hardware. You pay for query as a service, if you will. And you can see once you run this, you get the result, now this is going to be cached because I already ran it once, and, of course, it's JSON under the hood because we're in a column store situation. Also you can look here at the execution details, which you might want to when you're running massive queries so see how you might want to optimize this. And again, this course really isn't getting into the depths of optimization, because each of these products could be its own course, but I will tell you that it is extremely important that you understand how to optimize. This is one where I've actually done quite a bit of production work. So you might do something like partitioning the underlying files, compressing the underlying files, changing them to a different format because it was more efficient in terms of columnar reads. So the thing that is fascinating about this is it looks like a SQL database, but it is as far away from a SQL database as can possibly be. It's serverless and massively scalable and a column store. And if you don't believe me looking at this, I highly encourage you to read this white paper which talks about the underlying column store which is called Dremel. And I know when it first came out, I really wanted to understand how Google was able to do it, and I think in terms of what's happening with NoSQL and the possibility of NoSQL, frankly you'll always want to look to the leadership of companies at the scale of Google. And they don't do all of the work, but they've done quite a lot of the work. So if you read in here, it talks about how this paradigm of distributed compute on a column store was created, and there's the actual diagram that I used, and what are the mathematics behind this, how this actually processes data. You can see it's a nested columnar store. Also this is really useful if you're going to work with this service at scale, because if you understand the underlying storage, then you could understand how to properly optimize so that this amount of data scanned is the least possible, so you can provide the best value for your customers or for your business if you're working directly for a company. So, again, this is one that I would say you're probably surprised is in this course, but I think it's kind of the future, because the SQL query layer is what people want. It's not by accident that a lot of these something QL query languages look like SQL. You might have noticed the Cassandra in the previous movie looked like SQL. It's not, though. This is actual SQL, which is really kind of interesting. So it's in a value to filter when you're looking at NoSQL solutions, what is the query layer language like? How usable is it? And frankly the closer it is to ANSI SQL, probably the higher value it's going to provide more quickly to you.

Contents