How RAID Storage Improves Performance: Understanding Benefits and Trade-Offs of RAID Performance
RAID storage subsystems have provided some very tangible benefits for many years in business-class server systems. The critical part is understanding the trade-offs to avoid unmet expectations. For performance considerations this means understanding the differences between workloads running in a server environment versus the workload on a single-user PC.
Server Environment Workload Characteristics
In a computer that is functioning as a server, there are typically many users simultaneously accessing the services provided by that server. In some businesses, can be hundreds or even thousands of simultaneous connections, all of which are competing for server resources (compute, as well as disk bandwidth). The service requests to the server typically arrive somewhat independently from one another, and are thus asynchronous.
RAID Performance Benefits
In this kind of workload environment, with many, many service requests arriving simultaneously, any single provider of bandwidth can become a bottleneck, making each request back up in a wait queue. Comparatively speaking, disk drive access is anywhere from 1,000 to 10,000 times slower than CPU (compute) and memory access time. This means that disk drives are the weakest link in the performance chain.
Workload Request Mix is Important
Generally, if the ratio of read-to-write requests are low (i.e., a much higher number of writes compared to reads), then the RAID configurations which implement some form of parity will potentially cause performance penalties to the applications involved. This is due to the amount of additional work that needs to be done within the RAID storage solution for performing writes, as will be discussed in more detail below.
Performance Considerations by RAID Configuration
There are several types - or levels - of RAID storage implementation. The ones most often used are RAID-0, RAID-1, RAID-1+0, and RAID-5. Some storage implementations also provide a RAID-6 option. In general, performance gains may occur because the data is spread among multiple drives (or spindles), preventing any one drive from becoming a performance bottleneck. This is similar to opening up more cashier lanes in a grocery store.
RAID-0 is purely striping data over multiple disk spindles and can be very fast for multiple simultaneous I/O requests, however due to its complete lack of redundancy, is rarely used except possibly for scratch data which can easily be re-created.
RAID-1, with only two drives, is somewhat limited in the ability to improve performance. If there is a very high read versus write ratio, then most systems will allow either drive to service the read request and thus provide two paths for the data. However, there isn't much statistical spread to the workload I/O requests for service from the disk system.
The RAID-1+0 configuration seeks to provide the best overall solution to the performance and reliability questions. This configuration provides nearly all of the performance benefits of RAID-0, but provides the protection of RAID-1. Even though every write must occur twice, by spreading that load over many spindles in the RAID-0 stripe, the statistical probability of waiting in a queue for any I/O can be much smaller.
In a RAID-5 (and RAID-6) configuration, read and write requests are statistically spread among all spindles as well. There is an additional performance penalty for writes in that before the write is complete, it must first do two reads, calculate the parity, and then write both the data and the parity block (or two parity blocks for RAID-6).
Using Cache Memory to Hide Bottlenecks
Many RAID storage solutions will use fast memory called cache to alleviate write bottlenecks by telling the application the I/O is complete as soon as it hits the cache. Just don't have a power failure before that write is flushed from cache otherwise there is a risk of data corruption on the actual disk drives. Using a good battery backup can help prevent this issue.
RAID Performance Relevance in Single-User PC Workloads
The real-world performance benefits possible in a single-user PC situation is not a given for most people, because the benefits rely on multiple independent, simultaneous requests. One person running most desktop applications may not see a big payback in performance because they are not written to do asynchronous I/O to disks. Understanding this can help avoid disappointment.
Author: Jeff Sue