Even long before serverless was born, message queues had been a popular weapon in the arsenal of software developers and architects to integrate software systems. First, when there were large software systems which were incompatible with each other, message queues acted as the intermediary communication channels between them. Then the world moved into distributed systems and message queues became more and more popular as an integration medium due to some of their inherent capabilities such as message broadcasting, guaranteed delivery, in-order delivery and so on. Now, in an era where almost all the software systems are moving into serverless landscape, where do the serverless message queues fit-in?
Serverless Message Queues
Not surprisingly, they are EVERYWHERE! Why? Because naturally a complete serverless system is a combination of several smaller moving parts that processes messages and integrate with other resources. And these parts are required to pass messages and events between them. For example, such a system may contain multiple AWS Lambdas, Google Cloud Functions, Microsoft Azure Functions or any combination of them. Each one of these smaller execution parts operates with its own latency, own throughput and even scales up and down by its own. So when you have to move messages between such volatile subparts, one of the best and easiest approach is the use of message queues.
All the serverless service providers such as AWS, Azure and GCP provide several queuing solutions to choose from. Therefore the next problem will be which is the best and optimal queue service for your application. So the only approach to select the correct service would be to analyse the requirements of your application and choose a solution that meets them. Therefore this article discusses some of the main aspects of serverless queuing services that should be taken into consideration when choosing the optimal service.
Things to consider when selecting a Queuing Solution
Maximum Message Size
Maximum Message Size is the maximum size of the payload a single message on a queue can hold. Most of the serverless queues such as AWS SQS, AWS SNS, Azure Queues allow only up to several hundred kilobytes per message. But some of them such as AWS Kinesis allow up to a couple of megabytes as well. Generally, serverless systems deal with smaller sized messages that can be processed within a limited time period. Therefore this size limitations would not be an issue in most use cases. But if the size of messages generated by the source system exceeds approximately 1MB, it is important to select a queue that is capable of accepting such messages.
Maximum Throughput is the maximum number of message related operations that can be handled by a queue per second. Although some of the queuing solutions such as AWS SQS (standard queues) and AWS SNS do not impose any limitation on the throughput, most of the other queuing solutions do. And the definition of this limit varies from one solution to another as well. For example, an AWS SQS FIFO queue supports up to 300 message operations, i.e. the total number of read, write or delete operations per second. In contrast, an AWS Kinesis shard can support up to 1,000 put record operations per second. Because of these limitations, it is better to estimate the expected throughput of your application (message enqueuing rate as well as dequeuing rate) beforehand and choose a queuing solution which can cater to that throughput level.
Retention period is the maximum time duration that the queue storage will keep a message until the next processing stage consumes it. This next processing stage most probably can be a serverless function such as AWS Lambda or an Azure function. Most, if not all message queues have a defined retention period to prevent unconsumed and obsolete messages from piling up on the queue storage. Otherwise such a pile up will make it impossible for the publishers to enqueue new messages to the queue due to shortage of storage capacity. This aspect is very important especially in the serverless landscape because most serverless systems are a combination of multiple independent processing stages.
So if the processing stage which is responsible for retrieving messages from the queue has gone down, the queue should be able to retain the messages for a reasonable time period until that system becomes available again. On the other hand, what if your system is dealing with time-sensitive messages i.e. messages that become obsolete unless they are processed within a certain time period? In such a case, the queue should be able to discard them or send them to a Dead-Letter-Queue without handing them over to the next processing stage.
Guaranteed Ordering means that a subscriber of a queue will receive the messages in the exact same order that the publisher had published them to the queue. Some applications require this capability when it is important to preserve the message order between each processing stage. Although some queuing solutions such as AWS SQS Standard Queues and Azure Queue Storage do not guarantee this behaviour, there are many solutions such as AWS SQS FIFO Queues and AWS SNS that do provide this guarantee.
Most of the serverless queuing solutions fall into one of the two categories based on the message delivery method. When there are multiple subscribers for a queue, some queuing solutions such as AWS SQS deliver a particular message to only one of those subscribers. In contrast, other solutions such as AQS SNS deliver a copy of the same message to each and every subscriber aka broadcasting. Therefore based on the requirement of the serverless application, it is vital to select a queue which supports the correct delivery method.
Summary of Operational Limits on Popular Serverless Queuing Services
|Service||Maximum Message Size||Maximum Throughput||Retention Period||Guaranteed Ordering||Every subscriber can receive each message|
|SQS||Standard||256KB||Unlimited||Default 4 days. Can be configured to a value between 60s and 14 days||❌||❌|
|FIFO||256KB||3,000/s||Default 4 days. Can be configured to a value between 60s and 14 days||✔||❌|
|SNS||256KB||Unlimited||Default 4 weeks. A TTL can be set up to 14 days||✔||✔|
|Kinesis||1MB||1,000/s per shard||Default 24 hours. Can be increased up to 7 days||✔||✔|
|DDB Streams||400KB||40,000/s or 10,000/s based on region||24 hours||✔||✔|
|Azure Queue Storage||64KB||2,000/s||Infinite||❌||❌|
In conclusion, we can see that all of these services provide exceptional performance and reliability guarantees for very reasonable pricing, but only the behaviour will differ. Therefore it is the developer’s responsibility to choose the correct service or service based on the requirements of the application.