From the course: Faster pandas

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Various formats and why not CSV

Various formats and why not CSV

From the course: Faster pandas

Start my 1-month free trial

Various formats and why not CSV

- [Instructor] Under the hood, computers understand only bytes and bits. We work with higher level types, such as strings, numbers, lists, and of course, data frames. The process of converting these types to series of bytes and back is called serialization. There are many serialization formats out there, and some are more performant than others. In performance, we look both at size, how many bytes are stored, and speed, how fast can you serialize and deserialize data. CSV, which is a very popular format, is very bad at performance. It's a textual format which is wasteful in size, and since it has no schema, you need to pass text back to numbers, dates, and other types. If you can, try to pick a more performant format, such as SQL, HDF5, and others.

Contents