Performance and Scalability
In the A Word On Scalability posting I tried to write down a more precise definition of scalability than is commeonly used. There were good comments about the definition at the posting as well as in a discussion at The ServerSide.
To recap in a less precise manner I stated that
- A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added
- An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.
- A scalable service needs to be able to handle heterogeneity of resources.
There were quite a few comments about the use of performance in the definition. This is how I reason about performance in this context: I am assuming that each service has an SLA contract that defines what the expectations of your clients/customers are (SLA = Service Level Agreement). What exactly is in that SLA depends on the kind of service business you are in; quite a few of the services that contribute to an Amazon.com website have an SLA that is latency driven. This latency will have a certain distribution and you pick a number of points on the distribution as representatives for measuring your SLA. For example at Amazon we also track the latency at the 99.9% mark to make sure all of all customers are getting an experience at SLA or better.
This SLA needs to be maintained if you grow your business. Growing can mean increasing the number of requests, increasing the number of items you serve, increasing the amount of work you do for each request, etc. But no matter along which axis you grow, you will need to make sure you can always meet your SLA. Growth along some axis can be served by scaling up to faster CPUs and larger memories, but if you keep growing there is an end to what you can buy and you will need to scale out. Given that scaling up is often not cost effective, you might as well start by working on scaling out, as you will have to go that path eventually.
I have not seen many SLAs that are purely throughput driven. It is often a combination of the amount of work that needs to be done, the distribution in which it will arrive and when that work needs to be finished, that will lead to a throughput driven SLA. Latency does play a role here as it is often a driver for what throughput is necessary to achieve the output distribution. If you have a request arrival distribution that is non-uniform you can play various games with buffering and capping the throughput at lower than you peak load as long as you are willing to accept longer latencies. Often it is the latency distribution that you try to achieve that drives you throughput requirements.
There were some other points made with respect to what should be part of a scalability definition, among others by Gideon Low @ the serverside thread (I tried to link to his individual response but seem to fail) who make some good points.
- Operationally efficient – It takes less human resources to manage the system as the number of hardware resources scales up.
- Resilient – Increasing the number of resources will also increase the probability of failure of one of those resources, but the impact of such a failure should be reduced as the number of resource grows.
These two points combined with a discussion about cost/capacity/efficiency should be part of a definition of a scalable service. I’ll be thinking a bit about what the right wording should be and will post a proposal later.