From the course: Apache Kafka Essential Training: Getting Started (2021)

Messages

[Tutor] - In this chapter, we will explore a number of basic concepts and entities that make up Kafka architecture. We start off with a Kafka message. A Kafka message is the unit of data that is collected, stored and distributed by Kafka. Let's explore more about messages in this video. A Kafka message is also called an event. A message is record of a real world event at a point in time. But that definition does not constrain what a message is. It can be any piece of data. A message is equivalent to a row or record in a database. It can have attributes and values like a map. It can also be a blob that contains an image or an audio snippet. Kafka treats all messages as a byte array. It does not try to associate any semantics on the content of the message. That is the job of the producer and the consumer. Producers and consumers need to agree upon the content and format of the message. And be able to serialize and deserialize them. Kafka merely takes in binaries and distributes them. Size limits exist in Kafka and the maximum size of the message. It is configurable and the default size is One MB. While producing and consuming messages, the producers and consumers can do batch processing for efficiency. What are some of the key contents of a message? Kafka does have some predefined attributes. Messages in Kafka have a key. The key is defined by the producer of the message. Key are not mandatory and they also need not be unique. Keys are used for partitioning data. We will discuss partitioning further in the course. The value attribute of the message contains the actual message. It is a binary and the semantics of the value is user defined. Kafka does not infer anything from the message contents. Another key attribute to be noted is the timestamp. Every message is automatically timestamped by Kafka. Kafka supports two type of automatic timestamping. Event time is when the message producer creates a timestamp. Ingestion time is where the Kafka broker timestamps it, when it stores the record. This option is configurable. Now let's look at some examples for messages. The first message is a map with attribute names and values. In this case, it's an employee record in Jason. The message key is set to the employee ID. The second message is a web server log stored in CSV format. It has no explicit key. Kafka assigns a random key when a key is not provided by the producer. The third message is an image. It has the customer ID as the key. The content is raw bytes. Note that all these messages are internally stored by Kafka as by binaries. Hence the content can take any form. As long as the producers and consumers agree on the format. Messages are stored in topics. Let's explore topics in the next video.

Contents