From the course: MySQL for Advanced Analytics: Tips, Tricks, & Techniques

Loading large data sets - MySQL Tutorial

From the course: MySQL for Advanced Analytics: Tips, Tricks, & Techniques

Start my 1-month free trial

Loading large data sets

- [Instructor] I am going to show you how to load large quantities of data efficiently in MySQL. In this course, the use case we will use is that of website visits. Users or customers visit a website to view the products are for for sale. They are also look at reviews and related products. They may either buy the product or abandon the web browsing session. This CSV file shows you data about those visits. The data in this use case, captures user demographics, specifically gender, age, and salary. It also contains information about when the user logged into the website, how long they spent on the website, how long they viewed reviews, if they did so, and how long they spent on looking at related products if they did so. I'm gonna now open this exercise file and start loading data. And for that I'm gonna go into the MySQL Workbench. I'm gonna now open the MySQL Workbench and then open the script that we're gonna use to create the database. I'm already logged into MySQL, and I have created a database called analytics for me. So, I'm gonna open Load Bulk Data, Open, and this is the file we have. Let's explore the file. On line number five, we have the Drop Table If Exists for website visits, so if the table already exists here, we're gonna drop it. Then we have the Create Table statement. And in that we list out all the columns that are in the table. We have an ID for the record, the customer name, customer type, the date in which the browsing started, the duration of browsing, the gender of the customer, age of the customer, salary of the customer, the total duration the customer spent in reviewing the product, whether the customer looked at related products, and if so, how long, and finally whether the customer purchased the products. We also have other columns here, Is Male, Is Female, Viewed Review, Viewed Related, Age Range, and Salary Range. Those we are going to explore and create this data. We also have other columns here, Is Male, Is Female, Viewed Review, Viewed Related, Age Range and Salary Range. We are not going to populate them as a part of the load, but we will be populating them as a part of other exercises we are going to be doing further down in the course. So let us go ahead and execute these commands. Now the tables are created for you. And on the left side, if you do a refresh, you will see under Tables, the website visits table is there. Now let us look at loading data into the table. Now in order to load data efficiently, there are certain things you might want to disable in MySQL in order to make your data loading faster. For example, disable unique checks in line number 33. Disable foreign key checks, disable log binaries, and disable autocommit. And by doing this, you are relieving MySQL of doing a lot of jobs record by record that makes data loading extremely fast. So let me go and run these commands now. The way we're gonna load data is by using the Load Data Local Infile command. Usually in MySQL this is done off by default, so we gonna turn it on by setting this global variable local infile as on. Now let's look at the loading file command. So it is a load data, it's a local file, which is a local file from within your local file system. And here is the path to the file on line number 45. When you are running the exercise, make sure this path is set to where you have downloaded and stored your CSV files. And then into the table website visits, line number 47, it tells you more details about the CSV, that the fields are dominated by a comma. And then optionally enclosed by double quotes. The lines are dominated by the backslash n corrector. And we're asking it to ignore the first line, because the first line is the header. And then in line 51 and 52, we are mapping the columns in the CSV file to the corresponding columns in the table. So let us now go and run this command. Now loading is completed already for 10,000 records. And we can now go and do a select on the table to see the contents. So as you can see, the data is loaded correctly from the CSV file and as I was mentioning before, there are certain columns that are left empty and those we will be populating later in the course. And finally, let us go and commit the transactions once we have made sure that everything has loaded correctly. Now the data is committed and we are good to go. So, just a clean up process. You have set a number of these parameters to zero usually, so let's set them all back to one, so we get MySQL back into it's original state. So that completes the loading exercise. In the future videos we will see how we can use this data for data science.

Contents