In this article I will describe the theory of performance testing1 and how we conduct such testing in practice at my company TechSmart, in the context of rich web applications and web services.
By the end of this article you should understand how we define performance testing and perhaps get some ideas for implementing or customizing your own tools for performance-testing your own web service.
Performance testing of a web service typically involves verifying non-functional requirements such as whether it is:
Performance testing aims to answer the following kinds of questions:
The core of performance testing is load generation. You write a program that generates a specified amount of load on your target web service and measure what happens when various amounts of load are applied.
There are many load generation tools that exist:
One of the oldest load generation tools is JMeter, which uses a GUI to define simulation programs in an XML-based file format. JMeter uses a separate OS thread for each concurrent HTTP connection which severely limits the number of concurrent HTTP connections it can host on a single box.
At my company TechSmart we use Gatling as our load generation tool. Compared with JMeter, Gatling is far more efficient at generating load because it uses multiplexed async I/O on a single OS thread to handle all HTTP connections. Also, Gatling simulations are written in an actual programming language (Scala) rather than XML, so you can write your simulations with rich abstractions and do more-advanced customizations.
A scenario describes a pattern of HTTP requests that a single user makes against a web service. For example in the ViewCodePage
scenario a user performs the {LoginPage.loginWithoutRedirect
, CodePage.view
} subscenarios which consist of individual HTTP requests.
A simulation describes an aggregate pattern of HTTP requests that multiple users make against a web service. For example the ViewCodePage(X, Y)
simulation simulates X users arriving over Y seconds that each perform the actions described in the ViewCodePage
At TechSmart we have written several simulations that exercise each of the major pages on our platform website.
Most simulations written at TechSmart vary their behavior based on parameters that are passed in as environment variables. Simulations read these environment variables upon initialization using code like:
class LoginAsStudent extends Simulation {
val X = sys.env.getOrElse("X", "__missing__").toInt
val Y = sys.env.getOrElse("Y", "__missing__").toInt
Thus the set of parameters that a particular simulation expects can be deduced by reading the top of the simulation’s source code.
Most simulations at TechSmart support X and Y parameters to inject X users over Y seconds during the simulation.
The “gatling” management command is a low-level command we’ve implemented at TechSmart that invokes the Gatling tool, sets up various required paths automatically, and runs a Gatling simulation script.
A typical invocation of the “gatling” management command looks like:
$ X=4 Y=1 pm gatling --simulation tsplatform.LoginAsStudent
Note: The
command above is an alias forpython3
which is the Django task runner.
This invocation is equivalent to the more-verbose:
$ X=4 Y=1 $GATLING_HOME/bin/ --simulation tsplatform.LoginAsStudent --simulations-folder $PERFORMANCE_HOME/simulations --data-folder $PERFORMANCE_HOME/data --bodies-folder $PERFORMANCE_HOME/bodies
The Gatling tool emits lots of output in the console while it is running and eventually generates an HTML report with detailed statistics about what HTTP requests were made during the simulation, response times for individual and aggregated requests, and other information.
Typically the most important information in the Gatling HTML report generated by running a simulation is the maximum response time2 for a particular type of HTTP request that was made during the simulation.
Consequently we’ve created a “perftest run” management command that behaves similarly to the “gatling” command but automatically presents the maximum response time for the most important request type3 after running the simulation.
With the “perftest run” management command, we can easily answer the question:
Here is an example of running a simple simulation, using “perftest setup” to create test data and then using “perftest run” to get the maximum response time for a particular amount of load:
$ pm perftest setup students 8
Deleting test students...
Creating 8 test student(s)...
$ pm perftest setup calendars 1
Deleting test calendars...
Creating 1 test calendar(s)...
Associating 8 user(s) with 1 calendar(s)...
$ pm perftest run LoginAsStudent 4 1
Running simulation with 4 user(s) over 1 second(s)...
When 4 user(s) over 1 second(s), max response time for request 'submit_login' is 393 ms.
Report: /Users/me/pkgs/gatling-charts-highcharts-bundle-2.2.2/results/loginasstudent-1497049484195/index.html
(We also have a “perftest teardown” command that deletes all test data created by “perftest setup”.)
Our simulations are written by default to target the website running on the developer’s local machine ( For real testing you’ll want to run tests on a remote version of the website such as the one on a dedicated perf environment (
For example, to run a command on our performance environment we would first setup the environment with:
$ pm on perf perftest setup students 8
$ pm on perf perftest setup calendars 1
Note: The “on” management command runs some other command in the context of a particular remote environment.
And then we’d run the performance test by typing:
$ pm on perf perftest run LoginAsStudent 4 1
The “on” command sets the GATLING_BASE_URL
environment variable (among other things) and the “perftest run” subcommand passes that environment variable to the underlying Gatling simulation. The GATLING_BASE_URL
variable specifies the base URL that all HTTP requests are prefixed with. For example the above command is equivalent to:
$ # (Change environment to "perf", defaulting to its database tier, cache tier, etc)
$ X=4 Y=1 GATLING_BASE_URL= pm gatling --simulation tsplatform.LoginAsStudent
All simulations support the GATLING_BASE_URL
parameter to change the base URL because they all use a common Gatling HTTP Protocol object that defines its base URL from GATLING_BASE_URL
class LoginAsStudent extends Simulation {
val httpProtocol = Common.httpProtocol
object Common {
private val baseUrl = sys.env.getOrElse(
val httpProtocol = http
The “perftest maxload” management command can be used to automatically perform “perftest run” several times to determine the maximum number of users (X) over a given period of time (Y) such that the maximum response time for the HTTP request of interest is less than a particular threshold (1,200 ms by default).
With the “perftest maxload” management command, we can answer the questions:
An example invocation of “perftest maxload” looks like:
$ pm perftest maxload LoginAsStudent --preserve-calendars
Determining baseline response time...
Deleting test students...
Creating 1 test student(s)...
Associating 1 user(s) with 1 calendar(s)...
Running simulation with 1 user(s) over 5 second(s)...
When 1 user(s) over 5 second(s), max response time for request 'submit_login' is 138 ms.
Report: /Users/davidf/pkgs/gatling-charts-highcharts-bundle-2.2.2/results/loginasstudent-1497567171908/index.html
Seeking shaft of hockeystick...
Deleting test students...
Creating 2 test student(s)...
Associating 2 user(s) with 1 calendar(s)...
Running simulation with 2 user(s) over 5 second(s)...
When 2 user(s) over 5 second(s), max response time for request 'submit_login' is 124 ms.
Report: /Users/davidf/pkgs/gatling-charts-highcharts-bundle-2.2.2/results/loginasstudent-1497567182571/index.html
(... ditto for 4 users over 5 seconds ... OK ...)
(... ditto for 8 users over 5 seconds ... OK ...)
(... ditto for 16 users over 5 seconds ... OK ...)
(... ditto for 32 users over 5 seconds ... OK ...)
(... ditto for 64 users over 5 seconds ... OK ...)
(... ditto for 128 users over 5 seconds ... FAIL ...)
Seeking knee of hockeystick... About 6 step(s) or 1:41.
Deleting test students...
Creating 96 test student(s)...
Associating 96 user(s) with 1 calendar(s)...
Running simulation with 96 user(s) over 5 second(s)...
When 96 user(s) over 5 second(s), max response time for request 'submit_login' is 3754 ms.
Report: /Users/davidf/pkgs/gatling-charts-highcharts-bundle-2.2.2/results/loginasstudent-1497567312703/index.html
(... ditto several times, performing a binary search ...)
Maximum load is 65 user(s) over 5 second(s) (13.0 users/second) with a response time of 939 ms.
Notice how “maxload” used a binary search to automatically find the maximum load that the service could handle before the maximum response times went out of bounds. Neat.
Also notice that “maxload” outputs a CSV of every sample taken. This CSV is useful to graph as a scatterplot to see graphically how the response time varies depending on the number of users. We generate these scatterplot graphs often enough that we’ll probably extend “maxload” in the future to just generate a scatterplot image automatically.
Once you’ve determined the maximum load that your service can support, you can ask yourself whether that load is good enough, based on the level of traffic that you forecast your site will receive in the near future.
Should the maximum load not be good enough, you’ll want to isolate the bottleneck in the system that is constraining the maximum load. Then you can focus on optimizing that bottleneck so that the maximum load increases to be good enough. Note that if you optimize a bottleneck in one area sufficiently, you may find that the bottleneck moves to a different location.
For a web service, the most common bottlenecks in our experience are:
To isolate the bottleneck, use “perftest run” to start generating a load on the service equivalent to the maximum load it can handle (as measured previously by “perftest maxload”) for a long time interval, say 10 minutes. While that load is being generated, use monitoring tools to look at the usage of CPU, Memory, IOPS, and other resources on each box in the system to look for saturation. When you find the saturated resource, that’s the bottleneck.
Once you’ve located the bottleneck, there are usually a few options to optimize it away. To give you some ideas, here are some bottlenecks that the TechSmart website has hit in its performance testing (from least to most recently):
And here are some bottlenecks that our performance testing has identified we will hit under much higher loads than what we currently experience:
It is useful to determine what happens when your service is subjected to greater than its maximum load. In particular:
You should decide what desired behavior you want your system to exhibit when receiving a temporary over-maximum load (i.e. a spike) or a sustained over-maximum load (i.e. a flood). Then verify whether the actual behavior matches the desired behavior.
If your service is generally written with infinitely-flexible buffers, it’s likely that receiving sustained over-maximum load will causes requests to queue up until memory is exhausted, and the out-of-memory condition will cause the service to crash.
On the other hand if your service is generally written with fixed-size buffers, it’s likely that receiving any over-maximum load will cause requests to be rejected or dropped.
Hopefully this article has provided some insight into concepts around performance testing and given you some ideas about how to implement or improve tooling to perform performance testing.
If you have any improvements or other comments on the contents of this article I’d love to hear from you.
For the purposes of this article I define performance testing to include load testing and stress testing.↩
Many other performance testing tools focus not on the maximum response time but rather on other measures such as the 99th percentile response time or the mean response time, which are inappropriate to use.↩
The most important request type for a particular simulation is determined by inspecting the source code for a simulation file and looking for a line like val PROFILED_REQUEST_NAME = "view_code"
In Django, an easy way to eliminate a bunch of database queries is to make appropriate use of select_related()
and prefetch_related()
to pre-fetch a group of relationships all at once (with a constant number of queries) rather than one at a time (with a variable number of queries, depending on how many relationships there are).↩