Benchmarking Amazon's SQS

Amazon SQS queue icon

While considering using Amazon SQS for a project recently, I was surprised at just how little SQS performance data was available on the Internet. In particular, while there's a bit of information available regarding throughput, there is very little information I can find regarding message latency.

Indeed, as long as SQS can scale horizontally, throughput is really not very important at all (within reason of course). But latency cannot usually be improved by scaling, so if latency is too high to be acceptable for any given project, then scaling is unlikely to ever change that.

As latency is so important for the project I had in mind, I decided to throw together a very small set of simple SQS benchmarking scripts to get an idea of what sort of latency (and throughput) we can expect from SQS.

For the impatient (like me), feel free to skip down to the results.

The Scripts

Note, the following scripts have been moved to github: https://github.com/pcolby/scripts/tree/master/benchmarking-sqs

The simple benchmarking code is split into four files:

sqs.class.php:

timer.class.php:

tx.php:

rx.php:

Prerequisites

These scripts require the AWS SDK for PHP. So if using a brand new AWS EC2 server, for example, you'll need to install the SDK. The easiest way is something like:

sudo yum install php php-pear
sudo pear channel-discover guzzlephp.org/pear
sudo pear channel-discover pear.amazonwebservices.com
sudo pear channel-discover pear.symfony.com
sudo pear install aws/sdk

Next you'll need to create a temporary queue to test with... I used the AWS Management Console to do so, but there are a numerous command line tools you can use instead.

Finally, you'll see the sqs.class.php base class includes a number of FIXMEs, where you will need to set the AWS access credentials and SQS queue URL to use.

How

Once the prerequisites are taken care of, using the above scripts is pretty simple. However, the rx.php script in particular, needs to be used slightly differently depending on whether you're trying to measure receiver throughput or message latency.

Send Throughput

To test send throughput, simply run the tx.php script. You may wish to increase the number if messages to send (currently 1,000 - look at the very end of the script) to get longer running average.

At the end, and at various points in the middle, the script will output a dump showing a distribution of messages per send time, as well as the final average throughput.

Of course you can run as many instances as you like to test horizontal scaling.

Receive Throughput

Testing receive throughput is quite similar. However, you first need to build up a decent set of messages in test queue - naturally, you can do that quite easily by running the tx.php script a couple of times. Then, you simple run one or more instances of the rx.php script.

However, the thing to note here, is that the rx.php script outputs three separate timers for receives, deletes, and delays. For this test (receive throughput) you should ignore that last timer - it is for testing message latency, which is not valid in this context.

The reason receives and deletes are timed separately, is that they are two separate operations, sometimes with vastly different performances (receive operations, for example, are much more affected by your local network bandwidth than deletes are). For any given test environment, your overall consumer throughput is going to be the slower of the receive and delete timers.

Message Latency

Finally, the part I'm most interested in :) To test messages latency, you must first figure out how many rx.php instances you need to have running, to at least keep up with your tx.php instances. For example, if tx.php is sustaining an average of 30 messages per second, while rx.php is only sustaining an average of 20 messages per second, they you will need to be running at least two rx.php scripts for every tx.php script. Basically, you just want to be sure that test queue is in no way backing up - otherwise you'll be seeing an artifact of slow consumers, not message latency.

Once you know how many rx.php instances you need, start them up. They will just sit there waiting for messages. Then, when you're ready, start up the tx.php instance(s). Finally, watch the output of the rx.php scripts.

As mentioned above, the rx.php script outputs three separate timers for receives, deletes, and delays. In this context, the receives and deletes timers are meaningless (since the rx.php scripts are deliberately outpacing the the tx.php script(s)). However, not the delay timer is meaningful. The way this works, if that the messages sent by the tx.php script each begin with a microsecond-accurate timestamp for the time at which the message was sent. So the rx.php script simply computes the delay between when this message was sent, and when it was received, to generate the mean message delay and distribution.

Results

Finally, let's get down to some results. Note, these results are not meant to be definitive SQS benchmarks. They are as much a benchmark of the testing environment as SQS. For example, running these tests within EC2 gives vastly different results to external networks. You should run these, and other tests in environments matching your own production environments if assessing SQS for actual projects.

With that in mind, I've run these tests on M1 Small EC2 instances in both the US East (Northern Virginia) and Asia Pacific (Sydney) regions (in both cases, using test queues within the current region). You can see the results in the following sub-sections.

Mean Single-Threaded Throughput

As you can see from the gauges below, the send request is the bottleneck in this test setup. Note, the receive requests are actually slower (more on that below), but as they are batch operations - receiving up to 10 messages at a time, the overall receive throughput is still faster than send. You can perform batch send operations too, but as I'm less interested in using that feature at this time, I didn't bother to benchmark it (aka an exercise for the reader).

Overall single-threaded throughput is not too bad, and (from what I've read / anecdotally) scales well. It it interesting to note here that US performance is consistently lower than AU performance. This may be because the AU (Sydney) zone is less utilised currently (it is quite new) or may just indicate random variance in EC2 instances (of which there is plenty).

Single-Threaded Throughput Distribution

Next up we have the distribution of request durations. Here you can see that delete requests are considerably faster than the others. Send requests are in second place, with receive requests last. But as noted above, each receive request can return anywhere from 1 to 10 messages, so their overall message throughput is still higher than the other two request types.

Overall, the tests performed a little better in AU than the US... again, possibly due to the AU zone being significantly newer / less saturated.

Message Latency Distribution

And finally, we have the distribution of message latency for both zones tested.

Here we can see that the vast majority of messages arrive within ~200ms, and that both US and AU are pretty similar.

One interesting thing that you do not see in the above charts, is that in the AU region, I rarely but consistently saw a very small number of messages (around 1 per 1,000) delayed by a fraction of a second over 30 seconds. This was not seen at all in the (limited number of) US tests I ran. I suspect, though haven't investigated further, that the 30 seconds component is related to the default message timeout for the test queue I was using. These rare, but repeatable outliers may or may not be an issue for your application.

Conclusion

Despite being very simple, the SQS benchmark scripts above seem quite good for basic SQS performance testing

Within EC2, the first single-threaded SQS performance bottleneck appears to be send requests. Depending on your application, you can speed them up using batch-send requests, and/or scaling out.

Typical message latency (when receivers are not being overrun by faster senders), is between 20 ~ 200ms, with the vast majority of messages arriving within that range. Though the Asia Pacific (Sydney) zone does see very long delays (30+ seconds) occasionally.

Attachments

comments powered by Disqus