In previous examples, the ProducerRecord objects we created included a topic name,
key, and value. Kafka messages are key-value pairs and while it is possible to create a
ProducerRecord with just a topic and a value, with the key set to null by default,
most applications produce records with keys. Keys serve two goals: they are addi‐
tional information that gets stored with the message, and they are also used to decide
which one of the topic partitions the message will be written to. All messages with the
same key will go to the same partition. This means that if a process is reading only a
subset of the partitions in a topic (more on that in Chapter 4), all the records for a
single key will be read by the same process. To create a key-value record, you simply
create a ProducerRecord as follows:
ProducerRecord<Integer, String> record =
new ProducerRecord<>("CustomerCountry", "Laboratory Equipment", "USA");
When creating messages with a null key, you can simply leave the key out:
ProducerRecord<Integer, String> record =
new ProducerRecord<>("CustomerCountry", "USA");
Here, the key will simply be set to null, which may indicate that a customer
name was missing on a form.
When the key is null and the default partitioner is used, the record will be sent to
one of the available partitions of the topic at random. A round-robin algorithm will
be used to balance the messages among the partitions.
If a key exists and the default partitioner is used, Kafka will hash the key (using its
own hash algorithm, so hash values will not change when Java is upgraded), and use
the result to map the message to a specific partition. Since it is important that a key is
always mapped to the same partition, we use all the partitions in the topic to calculate
the mapping—not just the available partitions. This means that if a specific partition
is unavailable when you write data to it, you might get an error. This is fairly rare, as
you will see in Chapter 6 when we discuss Kafka’s replication and availability.
The mapping of keys to partitions is consistent only as long as the number of parti‐
tions in a topic does not change. So as long as the number of partitions is constant,
you can be sure that, for example, records regarding user 045189 will always get writ‐
ten to partition 34. This allows all kinds of optimization when reading data from par‐
titions. However, the moment you add new partitions to the topic, this is no longer
guaranteed—the old records will stay in partition 34 while new records will get writ‐
ten to a different partition. When partitioning keys is important, the easiest solution
is to create topics with sufficient partitions (Chapter 2 includes suggestions for how
to determine a good number of partitions) and never add partitions.