From the course: Learning Data Analytics: 1 Foundations

Understanding ETL in data

From the course: Learning Data Analytics: 1 Foundations

Understanding ETL in data

- [Instructor] I can remember the first time I heard about data warehousing. I wanted to go learn about it. My mind was blown. They were talking about normalizing data and then denormalizing it and storing it somewhere else. I found this interesting because I was already building little data warehouses, and now that process had a name. My customers needed to run reports on a routine and they needed to have meaningful data without having me recreate it all the time. I wasn't using super fancy tools. I was using what was available and common sense. I say that to say don't overthink all of this. You're going to hear a million different terms and thoughts on data over your career, but at this point, your focus should be learning how to get the data you have to a point that's readable and meaningful. The process of getting data from the source, making it meaningful, and placing it out for others to use or read is referred to is ETL. ETL is also a critical process of data warehousing. ETL stands for extract, transform, and load. I've seen other variations of this acronym, like extraction, transformation, and loading, and what I found is, just like me, there are a lot of business users who are doing ETL. They're doing ETL-like processes to achieve their reports, except maybe they're not flooding data warehouses. They're just building weekly or monthly reports. If you want to learn more about ETL, just simply search the library for the acronym ETL, and you'll learn a lot about it. When I think about ETL as a more universal concept, it really is applicable to any data. Now, the transformation, that's the T in ETL, that's where the cleaning, and standardization, and addressing the data we don't have occurs. Let's dive into the cleaning and modeling processes using various tools and we'll get into that T for transforming data.

Contents