Expanding The Cloud – High Performance I/O Instances for Amazon EC2
AWS customers are bringing their most demanding workloads onto the cloud. These include the likes of high performance computation, for which we introduced the Cluster Compute and Cluster GPU instance types. Customers are also bringing workloads on AWS that require dedicated and high performance IO for which we are now introducing a new Amazon EC2 instance type, the High I/O Quadruple Extra Large (hi1.4xlarge), to meet their needs.
The hi1.4xlarge has 8 cores and 60.5GB of memory. Most importantly it has 2 SSDs of 1 TB each and a 10 Gb/s Ethernet NIC that using placement groups can be directly connected to other High I/O instances.
The SSDs will give you very high IO performance: for 4k random reads you can get 120,000 IOPS when using PV and 90,000 when using HVM or Windows. Write performance on SSDs is more variable depending on, among other things, the free space the disk, the fragmentation and the type of filesystem. With PV virtualization we are seeing between 10,000 and 85,000 IOPS for 4k random writes and with HVM between 9,000 and 75,000.
With 15K RPM magnetic disks you will see a bit over a hundred IOPS at best. While disk density is still increasing the access speeds are not and as such they will provide good sequential access, however random access is not improving at all. A 3 TB disk can be read in 8 hours sequentially but it will take 31 days to read using random I/O. Magnetic disks are rapidly starting to exhibit tape-like properties and with modern workloads being increasingly random, they are becoming less and less suitable as a storage system. Even though SSDs are still more expensive from a storage point of view they are a much more cost effective solution from an iops point of view.
Databases are one particular area that for scaling can benefit tremendously from high performance I/O. The I/O requirements of database engines, regardless whether they are Relational or Non-Relation (NoSQL) DBMS’s can be very demanding. Increasingly randomized access and burst IO through aggregation put strains on any IO subsystem, physical or virtual, attached or remote. One area where we have seen this particularly culminate is in modern NoSQL DBMSs that are often the core of scalable modern web applications that exhibit a great deal of random access patterns. They require high replication factors to get to the aggregate random IO they require. Early users of these High I/O instances have been able to reduce their replication factors significantly while achieving rock solid performance and substantially reducing their cost in the process. Read the great detail of Netflix’s use of these instances for their Cassandra deployment.
Earlier this year I attended a panel on “Scaling to Infinity” with the top engineers from Netflix, Facebook, Tumblr, etc. In unison they proclaimed that in all of their systems the scaling bottleneck had been the database. These bottlenecks can often be attributed to constraints in the I/O system and the challenges of providing consistent I/O performance in systems that have not been designed for high performance I/O. The fast growing popularity of Amazon DynamoDB, which provides consistent read/write performance through an I/O provisioning interface, demonstrates that if the database can be configured such that it no longer is a bottleneck, applications can become much simple, and thus more reliable and scalable.
It is my expectation that with the increase of data-centric applications we will see more and more I/O hungry systems being built that require this type of rock solid High Performance I/O that the hi1.4xlarge can give you. For more details on the new instance type see the EC2 detail page and the AWS developer blog.
(As a side note: as others have observed, using a Log Structured Filesystem, such as NILFS, can significantly improve SSD write performance)