With the rapid technology change, two important concepts Message Queues and Event Streaming Platforms stand out as key components of efficient communication and data flow. These technologies play a pivotal role in seamless communication between distributed technologies. While initially appearing as distinct technologies to those unfamiliar, message queues and event streaming platforms exhibit remarkable similarities upon closer examination—much like biscuits and cookies. Where biscuits are crunchy cookies are supposed to be soft. They offer solutions to challenges such as scalability, resilience and real-time data processing, especially for Event-driven architecture.
Event-driven architecture is becoming increasingly popular in software applications and is expected to become more popular. The reason for this is that message queues and event streaming platforms are critical layers in the event-driven architectures and are easier to set up in a cloud environment, secondly with the rise of data apps applications can directly integrate real-time analytics. Streaming data cuts through many data engineering lifecycle stages. Where an RDBMS is usually a source system, we can use streaming platforms both as a data source and for passing information in an event-driven architecture. It can be used in both the ingestion and transformation stages to process data for real-time analytics. This blog gives an in-depth look into Message queues and Event streaming platforms.
Message Queues
A message queue is a mechanism to asynchronously send data (discrete individual messages) between discrete systems using a publish and subscribe model. Data is published to a message queue from a publisher and then sent to one or more subscribers. Once the acknowledgement is received from the subscriber the message is removed from the queue. Message queues allow the decoupling of applications and systems and are widely used in microservices architectures. Message queue buffers messages to handle sudden load spikes and makes messages durable through a distributed architecture with replication. Message queues are critical for decoupled microservices and event-driven architecture.
Some critical characteristics of message queues to keep in mind are message ordering, frequency of delivery and scalability.
Message ordering and delivery
The order in which messages are created, sent and received greatly impacts the subscribers. In distributed systems, setting up message queues is a tricky problem. Message queues are usually designed to ingest and send messages in FIFO (First in First out) order but for distributed systems, this is not always true. The order in which messages are ingested and delivered can greatly vary depending on the architecture of the system and the frequency of delivery.
Delivery Frequency
Messages can be sent either only once or at least once. If a message is sent exactly once, then once the acknowledgement is received from the subscriber the message is removed from the queue When messages are sent at least once then the same message can be consumed by multiple subscribers or one subscriber multiple times.
Scalability
Most of the popular message queues in the market can horizontally scale and run across multiple servers. This allows queues to scale up and down depending on the creation of messages from the producer. This also allows to storage of messages for resilience against failures. One disadvantage of this is that the order of message delivery changes.
Event-streaming platforms
In some ways event-streaming platform is a continuation of a message queue in which messages are passed from producers to consumers. A message queue is primarily used to send messages with some delivery guarantees. In contrast, an event-streaming platform is used to ingest and process data in an order. Data is retained in an event-streaming platform for some time, so it is possible to replay messages from a past point in time. A typical event for an event-streaming platform consists of three parts: a key, a value, and a timestamp. We can also have multiple key-value timestamps in a single event.
An event stream is an ordered sequence of events representing important actions in a domain. The event can be as simple as a button click on a website or as complex as a distribution centre updating its inventory.
Some critical characteristics of an event-streaming platform that we should be aware of are as below:
Topics
A topic can be considered as a collection of events that are closely related to each other. In an event-streaming platform producers stream events to a topic which is then sent to consumers. A topic can have zero to many producers and consumers. Consider for example the topic of plane tickets. This topic can be sent to both the check-in desk at the airport to board the passengers and the marketing department to run real-time analytics or marketing campaigns.
Stream partitions
We can consider stream partitions as dividing the stream events into multiple subdivisions which are parallel to each other like a multilane freeway. Having multiple subdivisions allows for parallelism and higher throughput. A partition key is used to distribute the messages across the partition. A partition key can be anything that logically partitions the messages into multiple subdivisions. When choosing a partition key, we must be careful that the partitions that we are creating are equally distributed and that not one of the subdivisions has many events and other subdivisions are empty or have very few events.
Fault tolerance and resilience
Event-streaming platforms are typically distributed systems, with streams stored on various nodes. If one node goes down, then another node takes its place which makes event-streaming platforms fault-tolerant and resilient. Due to this, there is no loss of records unless the records are deleted.
As we wrap our exploration into Message Queues and Event streaming platforms, it becomes evident that these two technologies which were initially perceived to be distinct, share a lot of similar characteristics. In the fast-changing world of technology, these two technologies have evolved as the standard where efficient communication and seamless data flow are valued.
The ease with which these technologies can be integrated especially in cloud environments and their role as foundational layers in event-driven architecture express their significance. Whether it’s decoupling applications, or enabling parallelism and higher throughput through stream partitions, message queues and event-stream platforms play a pivotal role in shaping the future of Software Engineering, especially Data Engineering.
Thank you for joining us in this exploration of Message Queues and Event Streaming platforms. As we move forward may your communication be as efficient and clear as a Message Queue and your streams flow as a river.
Happy Learning
Ajay Mahato
This post was inspired by the book Fundamentals of Data Engineering by Joe Reis and Matt Housley.
As always, I would love to hear your thoughts and feedback on this post and how I can improve it further.
If you enjoyed this post, then you might be interested in some of my other posts.
APIs
API stands for Application Programming Interface. APIs are a standard and popular way of exchanging data in the cloud for all kinds of systems from SaaS platforms and between internal company systems. There are many types of API interface but the APIs which are built around the HTTP are the most popular type on the web and the cloud. Many times, microse…
Non-Relational database
Non-relational databases are databases which do not store data in a tabular format. These groups of databases are called NoSQL which stands for Not only SQL. With the advent of the internet and the increase in data collection, companies realized that relational databases were not able to handle the surge in data and the different types of data collected…