From the course: Data Engineering Foundations

Intro to databases and their types

From the course: Data Engineering Foundations

Start my 1-month free trial

Intro to databases and their types

- [Instructor] Before we get down to creating databases, let's try to understand what databases are and the differences between their types. Databases are an essential tool for the data engineer. They can be used to store information. Before we deep dive into the types of databases, let's get some definitions out of the way. A database is a large collection of data organized in efficient structures and formats, specifically designed to support rapid search and retrieval. There are a few pieces of vital information in this definition. First, the database holds data. Second, databases organize data. We'll see later that there are different levels of organizations. Lastly, databases help us quickly retrieve or search for data. And the database management system or DBMS, is usually in charge of this task. The main difference between databases and simple storage systems, like file systems, is the level of organization, and the fact that databases, or database management systems extrapolate a lot of complicated data operations like search, replication, indexing, et cetera. File systems, on the other hand, holds less of such functionality. They are less organized. And they offer minimal features and functionalities. Among databases, there is a big difference in the level of organization. To understand these differences, we have to make a distinction between structured, semi-structured, and unstructured data. On one hand, structured data is coherent to a well-defined structure. Database schemas usually define such structure. An example of structured data is tabular data in a relational database. Semi-structured data is a form of structured data, but it does not follow the tabular structure of data models associated with relational databases, or other forms of data tables, but nonetheless, contains tags or other markers like key-value pairs to separate semantic elements and enforce hierarchies of records and feeds within the data. An example of semi-structured data is JSON data. Unstructured data, on the other hand, is schema-less. It looks a lot more like files. Unstructured data could be something like photographs or videos. So structured and unstructured data define outer boundaries, and there is a whole lot of semi-structured data in between. Another distinction we can make is the one between SQL and NoSQL. Generally speaking, in SQL databases, tables form the data. The database schema defines the relations between these tables. Because SQL database is relational, the database schema defines the relationships and properties. Typical SQL databases are MySQL and PostgreSQL. On the other hand, NoSQL databases are called non-relational and they are often associated with unstructured, schema-less data. Now that's a misconception, as there are several types of NoSQL databases, and they are not all unstructured. Two highly-used NoSQL databases types are key-value tools, like Reddis or document databases like MongoDB. In key-value's tools, the values are simple. Typical use cases are caching or distributed configuration. Values in a document database are structured or semi-structured objects. For example, a JSON object. We learn about the type of data and databases. To give you a high-level picture, we'll retrieve data from different sources in different formats. And we use different types of databases to handle these different formats as per our use cases.

Contents