Summary:
Burst loads of ~200 concurrent users and sustained loads of ~100 concurrent users can be served with a single EC2 t2.medium Application Server and a RDS t2.medium Database Server.
CPU on the Application server was the primary limiting factor in SCORM Engine performance, followed closely by write IOPS (I/O operations per second) on the database server. Adding additional Application Servers and increasing disk write throughput on the Database Server are the first steps to take in situations where greater loads must be served.
SCORM Engine on Windows/.NET:
Hardware:
Application Server:
- Windows Server 2012 Datacenter Edition
- AWS EC2 t2.medium instance
- 2 vCPUs (Intel Xeon E5-2676 v3 @2.40GHz)
- 4 GB RAM
- IIS 8.0
- .NET Framework 4.5
Database Server
- Windows Server 2012 Datacenter Edition
- SQL Server Express 12.00.4422.0.v1
- AWS RDS t2.medium instance
- 2 vCPUs (Intel Xeon E5-2676 v3 @2.40GHz)
- 4 GB RAM
- Burstable to ~150 disk IOPS (IO operations per second)
User Load:
The system was able to serve a concurrent user load of 196 users at Ideal levels of performance on a short-term basis. Request latency began to climb fast once user concurrency reached 207 users. The ability to serve a load of 196 concurrent users at Ideal levels of performance would be sustainable for a few hours at best before the instance’s CPU credits are exhausted, after which sustained concurrent user loads of ~100 users are more realistic.
The system continued to serve load at Acceptable levels of service up to 414 concurrent users, at which point request latency began to climb very fast to Unacceptable levels. By the time we were serving 433 concurrent users, request latency was above 6 seconds. At 734 concurrent users, latency was over 15 seconds, at which point the system became functionally unusable.
It is notable that IIS has a default connection timeout of 120 seconds - requests that it was unable to serve continued to wait up to this point before being dropped. As such, even connections that had very very high latencies still were being processed, albeit very slowly. Browser socket timeouts tend to be in the 60 second range - the SCORM Player will give up before a stock IIS installation does. Our load testing protocol did not limit timeouts, and latencies of over 90 seconds were observed at very high loads.
Conclusions:
CPU on the Application server was the primary limiting factor, followed closely by write IOPS (I/O operations per second) on the database server. Greater loads could be served by adding an additional Application server and by increasing disk write throughput on the Database server. The t2.medium instances we were running began to use CPU burst credits while serving loads of 100 Concurrent Users. To serve full-time, non-burst loads greater than 100 Concurrent Users, multiple Application servers or instances with greater CPU capacity will be required. Please review the data and charts below for more details.
Methodology:
For the purpose of this test, we have modeled a learner’s behavior around the following assumptions:
- That the user would spend 20 minutes on a training session,
- That during the session, that runtime data would be submitted every 20 seconds
- That a final exit event would occur at the end of their 20 minute session.
All of the sessions are based on a SCORM 1.2 course, in this case our Basic Runtime Calls Golf Example Course. This test was explicitly designed to test application server and database server performance, and specifically excludes network bandwidth testing and content delivery quality of service. Network bandwidth and disk I/O requirements for content storage and delivery are highly variable, and must be tested separately based upon the type of content that is being delivered. For example, serving high-definition video content will require both greater disk I/O and bandwidth than serving content that contains only text and still images.
Each load test user scenario runs as follows:
- We query the courses api to get a list of courses.
- We create a registration for a course.
- We generate a launch link for the registration.
- We launch the registration.
- We enter a loop where we post new results 10 times in a row, waiting for 20 seconds between posts. This simulates the SCORM Player auto-submitting results every 20 seconds.
- We post a final set of results and complete the course.
- We check to ensure that the registration has been completed.
Primary Metrics:
Concurrent Users: We define Concurrent Users as the number of users that are simultaneously launching courses and recording their results back to the system. We have modeled the behaviour of the test on the assumption that a user’s session will last 20 minutes, and that the SCORM player will submit results from a running course every 20 seconds. As such, a user that launches a course during minute 1 of the test still counts as a concurrent user during minute 19 of the test, as that user is still recording results to the system every 20 seconds. It is only when that user has completed their scenario after 20 minutes that they cease to be a “concurrent” user.\
Request Latency Target (Apdex): We measure performance by 99 percent of requests being of or below a certain latency (as opposed to an average or median). This metric is referred to as the Apdex.
Apdex Levels:
- Ideal: 99% of requests complete in < 1200ms
- Acceptable: 99% of requests complete in < 2000 ms
- Frustrating, but functional: 99% of requests complete in > 2000ms and < 5000ms
- Unacceptable, system is unresponsive: 99% of requests complete in > 5000ms
Windows Test Data and Charts:
We sampled load test performance data every 10 seconds. The following data points represent timestamps at which the system crossed over a threshold to a new Apdex score. All timestamps are Zulu time.
Timestamp |
Concurrent Users |
P99 Request Latency (ms) |
Apdex |
22:14:54.116Z |
196 |
282.3 |
Ideal |
22:15:04.116Z |
207 |
407.8 |
Ideal |
22:18:34.144Z |
414 |
2798.8 |
Frustrating |
22:18:44.148Z |
433 |
6568.3 |
Unacceptable |
22:21:14.182Z |
734 |
15228.4 |
Unacceptable |
The timestamp above @22:15:04 coincides with the CPU on the application server reaching approaching the 90% utilization mark. By 22:21 the system was pegged near 100%.
The database server was not yet fully utilized by this point, but it wasn’t far behind. The chart below shows the database server approaching max Write IOPS of approximately 150 per second, which is a measure of disk write performance, and is a typical limiting factor for a SCORM Engine installation.
Database CPU usage was not a limiting factor, but would rapidly become an issue if additional application servers were added.
Other factors:
CPU Burst Capacity in EC2 t2 class instances
The AWS ec2 instances that we used for these tests are capable of CPU bursting, which allows them to temporarily server loads higher than the vCPUs allocated to them would otherwise permit. The ability to burst is controlled by CPU Credits that are allocated to the instances. A secondary test showed that serving loads of greater than 100 Concurrent Users on a single t2.medium instance caused us to spend CPU Credits. As such, loads greater than 100 concurrent users should be considered “burst” performance, and not sustainable without the addition of new application servers.
Caveats:
These tests are designed to determine the maximum number of simultaneous users that a given SCORM Engine deployment is able to support when those users are engaged in the most common (and resource-intensive) actions in the system: course launch and the recording of results. The load tests do not factor many of the other operations that occur in the course of SCORM Engine operations. For example, if your application is using the SCORM Engine API to generate reporting data, it will incur a significant load on the system that these tests cannot account for. Some applications, especially ones that allow end-users to create new courses and registrations, generate significant administrative overhead that is not accounted for in these tests.
For comparison, SCORM Cloud, which is the most active SCORM Engine deployment for which we have data, serves loads of 70-90 course launches per minute on a regular basis, which works out to approximately 1400-1800 concurrent users by the standards of this load test. SCORM Cloud runs on between 3 and 9 m3.xlarge application servers and a db.r3.4xlarge database master server. Please see the SCORM Cloud Usage Statistics Overview document for more details.
Database growth:
On an active deployment that is serving primarily SCORM content to 500+ users per hour, we expect database growth of 1-2 GB per month. This number is highly variable, however, depending upon your course content. In particular, if you are using SCORM to xAPI functions or recording large quantities of xAPI data your database growth may be much greater than this.
SCORM Cloud consumes approximately 22GB of database storage per week. This number is quite obviously exceptional, but it does underscore the importance of provisioning database storage with a eye on the expected use of the application.
We hope this data proves useful to you as you plan your deployment. Please don’t hesitate to contact us at support@rusticisoftware.com with any questions you may have.