Review for Paper: 6-RAID: High-Performance, Reliable Secondary Storage

Review 1

Enterprise applications require high-reliability storage that can recover from single-disk failures. Many systems have been proposed to store data for a combination of high throughput, low overhead from redundant storage, and high reliability. Several varieties of redundant arrays of inexpensive disks (RAID) are available, but it is difficult to choose among them because they have varying effects on many aspects of performance.

In “RAID: High-Performance, Reliable Secondary Storage,” Peter Chen and others present the trade-offs inherent in several RAID levels. They begin by discussing the motivation for RAID and its two key features: striping and redundancy. Data striping increases throughput by distributing logically nearby blocks of data across multiple disks, so that a read or write over these blocks can be executed by multiple disks working in parallel. Redundancy, whether via mirroring, error-correcting codes, or parity, allows a disk array to recover data after a single-disk failure.

The paper presents the difference between RAID levels from multiple perspectives, including the design perspective that motivates each level, and a theoretical perspective on the performance of each level (along with its error correction group size and operation type). The authors find the best write speed comes from a RAID 0 system with no redundancy, while block-interleaved distributed-parity arrays (RAID 5) have the fastest reads and writes of any redundant array type. RAID 5 is fast because every disk can service some read or write requests, and there is no single parity disk creating a bottleneck. Applications that must tolerate two-disk failures can use RAID 6 (P+Q redundancy), which uses two parity blocks instead of one for each error correction group, at slightly higher storage overhead than other parity systems such as RAID 5.

As noted in the paper, the authors' estimates of mean time to data loss (MTTDL) and risk of data loss over time are based on simplified theoretical models of a database system, not on actual data from deployed systems. Moreover, the relative overhead of reads and writes for various RAID levels are estimated based on simple math expressions, such as (1/G) where G is the error correction group size, for the relative cost of a small read in RAID 3. These estimates are a central focus of the paper, so it would have been nice to have them backed up by empirical data.


Review 2

The authors talk about the research done on RAID (originally redundant array of inexpensive disks, now commonly redundant array of independent disks). The exponential improvements in the architecture of transistors require improvements in the secondary storage sphere as well. For this purpose, almost 25 years ago, attempts were made to formulate research in this field.

For explaining the different organizations of RAID, the paper outlines the basic terminology of disk structure, reading and writing process. It then explains how the data is transferred from a disk to a host. It is noted that with the increase in disk density and rotational density have increased the transfer rate of the disks over the years but it has come to a standstill due to hardware limitations.

The concepts of data striping and redundancy are in the spotlight in this paper. Not only they provide the basis for parallelism increasing the performance of a disk read/write, they also provide a way to prevent errors due to data corruption. The basic RAID organizations from Level 0 to Level 6 are explained with the following key points being made –

Level 0 – Offers best write performance. No recovery options.
Level 1 – Disk read time is better than Level 0. Has a redundant disk for every disk in the structure.
Level 2 – Provides a lower cost recovery option than Level 1. Storage efficiency increases as it uses parity to recover a disk, requiring less number of disks than Level 1.
Level 3 – Requires lesser space than Level 2 as it uses only one disk to store the parity components.
Level 4 – It is similar to Level 3 but it uses block of size for storage instead of bits. It is better in performance for large writes.
Level 5 – Eliminates the parity disk bottleneck present in Level 4 by distributing the parity uniformly over all the disks.
Level 6 – Prevents against disk failures.

All the levels are critiqued based on a performance and cost comparison. Different metrics are chosen and graphs are plotted based on the different benchmarks set by the levels. The authors then provide various future research topics to extend the study in this area.

As far as its strengths lie, the paper provides the readers with basic information on how a disk works highlighting the various components involved in simple language. It also provides a good logic for choosing the comparison metrics used for performance tests. The authors also claim that the levels of RAID are confusing even for them and give the readers their honest opinions on the whole topic. However, the paper doesn’t go into the depths of some concepts after mentioning the terms (like the Data paths topic has more than required alien terms). Also the metrics are somewhat debatable for all the levels. Overall, the paper provides enough insight into the world of RAID and how it can be advanced.



Review 3

What is the problem addressed?
This article gives a comprehensive overview of disk arrays and provides a frame work in which to organize current and future work.

Why important?
Due to the sustained exponential improvements in semicoductor over past decade known as Moore's law, the improvements in secondary storage technology is far not enough to catch the performance gap. To relieve this bottleneck, much research has be propose to have different configuration of secondary storage system to improve the performance. On the other hand as the size of data increases the failure of storage becomes a problem and reliability came into stage. This survey overview about RAID(redundant array of independent/inexpensive disks) which try to solve these two problem: efficiency and reliability.

1-­‐2 main technical contributions? Describe.
It discusses two architectural techniques used in disk of arrays: striping across multiple disks to improve performance and redundancy to improve reliability. Data striping distributes data over multiple disk units to which allows multiple disk I/O to be performed in parallel in two ways: different requests may be performance by separate disks which decreases the queuing time, secondly single multiple block can be serviced by multiple disks acting in coordination.

On the other hand to improve reliability of the disk array we need to add redundancy which can be distinguished based on two features: granularity of data interleaving and the method and pattern in which the redundant information is computed and distributed. Firstly, the finer grain of the data will result in higher transfer rates, but there is overhead to partitioning for every requests and hard to parallelize multiple request. For redundancy we can use error correcting code to recover the data such as Reed-Solomon codes and the patterns in which redundant information can be distributed in either small number of disks and distributed throughout the array.

1-­‐2 weaknesses or open questions? Describe and discuss
As the survey mentioned, few published measurement results and experience which play an important role in technology transfer and forms the basis for developing new optimization. Still large variety of disk array organization to explore and study. Moreover, for parallelism there will introduce more problems: physical size connectivity and storage control process. I think there should be a huge lecture after this paper.



Review 4

This is a fine paper.


OK


Review 5

This paper describes a new technique called Disk Arrays and RAID (Redundant Arrays of Inexpensive/Independent Disks). In short, there are two basic elements in Disk Arrays, Data Striping for improving the performance and Redundancy for improving reliability. Disk Arrays organize multiple independent disks into a large logical disk, and it can stripe data across multiple disks so that it can have parallel access to obtain better performance. As for redundancy, it was developed to deal with disk failures and ensure reliability. There are many ways to combine data-stripping and redundancy, and this paper takes RAID as an example to explained 7 different schemes (levels 0-6) in detail. Also, the paper provides a discussion on opportunities for future research in the end.

The problem here is how to coordinate both data striping and redundancy to achieve better results. The environment varies so that the “best” scheme varies correspondingly. For example, if the environment emphasizes on performance and capacity rather than reliability, non-redundant disk array (RAID level 0) might provide better result. Below are the summaries for major contribution for RAID levels 0-6:

RAID Level 0: supercomputing environments, performance and capacity are primary concerns
RAID Level 1: environments where availability and transaction rate are more important than storage efficiency
RAID Level 2: only one of multiple redundant disks is needed to recover
RAID Level 3: applications that require high bandwidth but no high I/O rate
RAID Level 4: read-modify-write procedure, parity disk is the bottleneck
RAID Level 5: resolve the parity bottleneck by distributing the parity uniformly across all disks
RAID Level 6: better performance on recovery

I think this paper has a very objective and comprehensive discussion on comparing different schemes for reliability, performance and cost. There are many ways to measure each metric such as I/Os per second, bytes per second, response time, I/Os per second per dollar, and etc. And the author take a lot into consideration when comparing different schemes, which give the reader a better understanding of advantage/disadvantage of each scheme.

An interesting observation: I think this paper is very good not only on comprehensive summary for Disk Array technology, but also the idea that experience reports can play an important role in technology transfer, even developing new technology or optimizations.



Review 6

This paper provides an overview on RAID, redundant array of inexpensive/independent disks. For decades, the speed gap between secondary storage systems on the one hand, and main memory and CPU on the other hand, has been a major problem. To solve this issue we need fast secondary storage system, which RAID aims to be. The basic idea behind RAID is to have an array of disks which work in parallel, instead of a single disk. Also, because an array of disks is less reliable we should have redundant array of disks to make the storage system reliable enough.

The paper provides seven different organizations for RAID, level-0 to level-6. different RAID levels use different data stripping and redundancy schemes. Then, some performance comparison is for different RAID levels is provided. Next, paper discusses about reliability and points out some factors that should be considered.

The structure of paper was good and the material was written in a clear way. However, I think the proposed technique suffers from some weaknesses, which are
-Redundancy usually bring about two major issue: difficulties in consistency management and consuming too much space.
-No discussion about scalability of the system is provided. Is RAID easy to scale when we need to increase the storage capacity?


Review 7

The purpose of this paper is to provide an overview of RAID, the motivation behind it, and the present some of the implementation details. Essentially, the problem comes down to the fact that disks are not improving at the same rate as other components with a computer. This primarily comes down to the fact that disks are mechanical, but even SSD's, with no moving parts, are slow and can be improved. RAID is used to make many cheap, faulty disks into one logical disk, with better performance and reliability. After explaining the background motivation of RAID, the authors go into a discussion of the different levels, 0 through 6. They describe the different parity methods that are used, how many disks each system can handle losing, and how the parity is spread across the disks. After explaining the architectural differences, they describe the differences in performance between the levels. They then discuss some more complicated aspects - bit errors, system crashes and the resulting parity inconsistency issues, and expected time to disk failure (and potential data loss!).

The paper does a good job of providing a overview of RAID. It is a very comprehensive review of the issue - covering the motivation for RAID, the technology of the disks themselves, and compares the performance of the different RAID levels. I especially appreciated the performance comparison between the levels. I've been using RAID for a while, but hadn't really understood the differences between levels 2-6.

Peter Chen wrote this paper. How could their be anything wrong with it?


Review 8

This paper introduce the popular topic RAID, namely Redundant Arrays of Inexpensive Disks. The motivation is that people’s interest in RAID grows greatly because of the rapid development of semiconductor technology and thus the great improvement of the performance of microprocessors. Since the computer can manage the data faster, it needs larger quantity of data to work with for the same length of time. Thus the large and high performance storage systems are expected. So the author provide a detailed discussion of disk arrays.

Disk array organized multiple disks to build a large logical disk with low cost and high performance. But because of the large quantity and limit quality, disk array has high failure rates. Besides the improvement of performance, the reliability and fault tolerance is the most important part of a RAID. This paper not only conveys the basic organization and architecture knowledge of RAID, but also compares seven basic disk array organizations to find out the strength and weakness of each solution in this paper.

The strength of the the paper is that it takes a lot of consideration of readers with few knowledge of RAID. The background is detailed with even the terminology explanations. Since the paper is very long, the right definition of terminology in the beginning could help the readers to capture the idea of the author better.


Review 9

This paper goes over basics of disk storage, the different RAID configuration, the benefits and tradeoffs of each, and the metrics that need to be considered when choosing a storage configuration for a server. It is a good high-level overview of what is currently available and the different criterias that need to be considered about the storage device characteristics of a database.
The papers goes over seven basic RAID organizations:

Level 0 - nonredundant disk array where there is no redundancy, but simply performance benefic for large data.

Level 1 - mirrored organization where the data is simply duplicated to a secondary disk. Storage overhead is double, but performance is improved.

Level 2 - Memory-Style ECC that uses Hamming codes as parity codes, and the number of extra disks required is logN.

Level 3 - Bit-Interleaved Parity, where data is interleaved bit-wise and only a single parity disk is needed. But it comes with performance cost as only single request can be processed at a time.

Level 4- Block-Interleaved Parity is a block-partitioned version of Level 3, which allow small reads within a block to have better performance.

Level 5 -Block-Interleaved Distributed-Parity is similar to Level4, but the parity portion of the data is distributed across the disks to reduce the bottleneck that can happen when all the parities are on a single disk.

Level 6 - P+Q Redundancy uses Reed-Solomon codes that allow the disk to handle multiple failures occurring at the same time, rather than being limited to only one.

The three main metrics the paper focus on when evaluating disk arrays are: reliability, performance and cost. We see that level 5,6 system performs extremely well in cost-performance perspective for reads and small writes and scales very well for large number of disks, although level 6 performs better for large writes. Regards to reliability, level 6 is better than level 5 in most metrics, as level 5 has poor reliability for disk failures and bit errors.

The performance and tradeoffs of RAID disks are quite interesting, and the paper does a great job presenting the material.



Review 10

This paper surveys about RAID(Redundant Arrays of Inexpensive Disks) including basics about the disk array and promising future research direction.

RAID is proposed to enhance the performance of a single disk while it provides redundancy to improve reliability. From this paper, several aspect becomes clear about the RAIDs:
there are several ways to provide reliability, which are categorized as RAID level 0 to level 6. Among them, level 0 does not provide redundancy at all. It sacrifices reliability to get desired performance. Level 1 - 5 uses different parity strategy to provide redundancy and level 6 implements P + Q redundancy, which is more strict and can protect up to two disk failures.
Making choice among those redundancy schemes is no easy task. Three primary metrics, Performance, reliability and cost has to be all fully considered. The evaluate standard needs to be carefully determined. In the cost and performance aspect, performance / cost metrics is often a reasonable choice.
Theoretically, the basic reliability is analyzed under the assumption of independent disk failure. However, this is idealistic and other factors have to be considered, like correlated disk failures. Systems crashes can be worse than disk failure, since they happen more often and causes inconsistency parity. To avoid the parity loss without degrading the performance, nonvolatile RAM is needed. In addition, uncorrectable bit errors are no small issue. The error rate is small but when it comes to reading large volume of data, it is likely that uncorrectable bit error occurs. So it is a significant factor in the design of large, highly reliable disk arrays.
About implementation, there are some more to be considered.In order to avoid stale data, the metadata tracking the validity of each sector of data and parity must be maintained in disk arrays. After a system crash, parity needs to be regenerated. To regenerate efficiently, buffer pool can be used to track a fixed number of inconsistent parity sectors. For a large number of transactions, incorporating a group commit mechanism is a good choice to enhance throughput. Last but not least, consider the connection to the host, orthogonal RAID is implemented, as under a string failure all disk arrays are still accessible by this approach.

Despite the technical part, it also mentions the direction of future research. Experience report would play an important role in technical transfer. Interaction among new organizations may become an essential drive force to the storage system development. Parallel computer is a trend and the network of storage systems is still a big issue. In addition, latency is also a hot direction and data prefetch may be a good approach.


Review 11

Problem:
The increased performance of microprocessors has enabled applications to process data faster and in higher volumes. However, hard disk performance has not improved as much as performance of microprocessors. This means that the hard disk is often the bottleneck preventing systems with powerful microprocessors from reaching their full potential.

Contributions of this paper:
This paper outlines the main ideas behind RAID (Redundant Arrays of Inexpensive Disks). RAID involves connecting hard disks together to create a single logical disk. Data striping (distributing sequential data across multiple disks) can improve throughput as small requests can be serviced simultaneously by different disks, and large requests can be served by several disks at a time. Duplicating the data across disks using parity is used to prevent data loss in the event of disk failures. There is a trade-off between performance and availability, as more duplication causes the data to be more tolerant of faults but increases the amount of work needed for write operations (because duplicates need to be updated on writes).
The paper also spends time considering three different failure modes (disk failures, system crashes, and uncorrectable bit errors) and estimating the likelihood that these failures could cause data loss on different versions of RAID. The authors show that even though the time-to-failure estimates can seem astronomically large (millions of years) if based only on time-to-failure for individual disks and if enough duplication is used, the reliability of these systems is actually much worse when considering the possible effects of system crashes (which are much more frequent than disk failures) and uncorrectable bit errors on data loss.

Weaknesses:
One weakness of this paper is its generality, especially in the reliability estimates. As the paper itself mentions, the reliability of a system is very dependent on specific implementation details, and not only on its high-level organization which was what was considered by the paper. This weakens the power of the paper’s reliability estimates. However, the paper is successful at showing that disk failure rate should not be the only factor considered in evaluating reliability, and that more work needs to be done to address other failure modes.



Review 12

The paper provides a comprehensive overview of disk arrays, especially disk array structures called RAID (Redundant Arrays of Inexpensive Disks). It first explains the context of disk arrays and what were the factors that made them popular. The paper then describes seven disk array architectures ranging from RAID-0 to RAID-6 and compares their cost, reliability and performance. In addition to these, the paper also discusses the implementation consideration and advanced topics on disk arrays. From a database researcher’s point of view, it is necessary to know how techniques such as RAID are applied and used in practice as databases are bound to rely on these hardware implementations. Databases cannot guarantee consistency and durability at all if disks are not reliable enough to support them. It is also obvious that the performance of database is upper-bounded by the performance of its underlying hardware and disk arrays improve both disk bandwidth and throughput, while enhancing reliability with added redundancy at the same time.

There had been two major factors that increased the interest in RAID. First, faster microprocessor and larger memory systems populated the need for larger and higher-performance storage systems. The performance improvement of related technologies (e.g., magnetic media and mechanical systems) for storage systems had been much slower than that of microprocessors, which in turn accelerated research on disk arrays to overcome the performance gap. Second, faster microprocessor also enabled new applications, such as video and multimedia at the time. This only accumulated the need for faster and reliable secondary storages more.

Disk arrays organize multiple independent disks into a large, high-performance logical disk. There are seven disk array configurations you can use to organize disks in RAID from RAID-0 to RAID-6. Using multiple disks to distribute read/write workloads can give a performance boost, but simultaneously it makes the logical disk more vulnerable to disk or system failures, which results in reduced reliability. To overcome this, some of RAID configurations introduce data redundancy utilizing a certain proportion of disks to have parity data for data recovery in case of failure. This is necessary to ensure the reliability of the storage, but comes with the sacrifice in performance and higher cost for acquiring more disks. The paper discusses different RAID configurations in detail. Basically, these configurations make trade-offs between cost, performance and reliability.

In conclusion, it is interesting to see that how data redundancy is implemented at hardware level with disk arrays and also how the RAID configurations are somewhat similar to logical data structure used in distributed file systems, such as GFS/HDFS. It is important for database administrators or even researchers to be aware of hardware technologies like RAID, because without such consideration, one cannot come up with methodologies that can maximize the performance of databases. It is too bad that the paper is outdated to address many of current emerging technologies in the secondary storage sector (i.e., SSD, NVRAM), but the paper is definitely worth reading as disk array is the concept that is widely employed even till today.



Review 13

This paper is a survey concerning disk arrays and the techniques currently used to improve performance and reliability. It also summarizes some possible areas of research to improve disk arrays. The paper begins with introducing the reader to some terminology when talking about disk arrays then covers data striping. Data striping takes data and distributes it across multiple disks while giving the illusion of all of the data being present on a single disk. Using it, parallelism can be achieved when performing I/O operations providing improved performance. The paper also discusses data redundancy as a means of improving reliability and introduces the 7 raid levels from 0 to 6. Each level, except 0 which is used for comparison, is initially analyzed based on the performance measure of throughput per dollar. The raid levels are also analyzed based on their reliability in the face of failures sources that are not disk crashes such as system crashes or uncorrectable bit errors. Finally, the author lists a few areas that are either not well understood or still have open topics for research


The paper makes an interesting observation between the trade-off performance and reliability, which seems to persist in many areas of computer systems. The paper observes that although more disks results in performance improvements, it leads to lowering the reliability of the system unless redundancy measures are implemented. However, sometimes it is acceptable to lose reliability in order to improve performance hence the use of RAID 0 in supercomputing environments.

I find it curious that the author simplified the paper so non-technical readers could understand it. He included an introduction section for technical terminology as well as labeling the raid levels with an English translation of the method employed for redundancy. The simplifications seem to contrast with the overall purpose of the paper, which is a survey of disk arrays and the current techniques used. The final section, especially, conflicts with the initial simplifications. Adding a section that proposes possible research topics indicates to me an expectation that the readers are mostly fellow colleagues in the field looking for research ideas, not random people who may not understand the terminology related to disk arrays. I think if this was the expectation, then the paper could have done without the dumbing down and would have been a much stronger, and more useful summary of disk arrays for new researchers in the field.


Review 14

Disk arrays, proposed in the 1980s as a way to use parallelism, are multiple independent disks which are organized into a large, high-performance logical disk. Performance and reliability are the driving forces that have popularized disk arrays. There are two benefits of disk arrays:
1. Stripe data across multiple disks and access them in parallel to improve performance.
2. Employ redundancy to provide reliability.

This paper introduces RAID (Redundant Arrays of Inexpensive Disks) in detail. In addition to discussing about levels 1 to 5, they also present levels 0 and 6. The summary of these seven levels:

Level 0 (Non-Redundant):
- Best write performance but not the best read performance.
- Frequently used for scenarios that reliability is not major concern.

Level 1 (Mirroring):
- Uses twice of disk as redundant storage so that there are always two copies of any data.
- Frequently used in database applications.

Level 2 (Memory-Style ECC):
- Uses Hamming codes to reduce the amount of replicate data.
- The number of redundant disks is proportional to the log of the total number of disks.
- Multiple redundant disks are needed to identify the failed disk, but only one is needed to recover the lost information.

Level 3 (Bit-Interleaved Parity):
- Uses a single parity disk rather than a set of parity disks to recover lost information but the parity disk
cannot participate on reads as a result.
- Data is conceptually interleaved bitwise over the data disks so only one request can be serviced at a time.
- Frequently used in applications that require high bandwidth but not high I/O rates.

Level 4 (Block-Interleaved Parity):
- Data is interleaved across disks in blocks of arbitrary size, called as striping unit, rather than in bits.
- Small write requests would require four disk I/Os read-modify-write).
- The parity disk can easily become a bottleneck.

Level 5 (Block-Interleaved Distributed-Parity):
- Distribute the parity uniformly over all the disks.
- Best small read, large read, and large write performance.

Level 6 (P + Q Redundancy):
- Uses Reed-Solomon codes to protect against up to two disk failures using the bare minimum of two redundant
disks.
- Also performs small write operations using a read-modify-write procedure, but requires six disk accesses due to the need to update both the “P and “Q” information.


The major contribution of this paper is that it goes over the seven levels of RAID architectures and discuss the design under each level. They also compares the reliability, cost, and performance differences of them. Reliability is further examined in three situations: system crashes, uncorrectable bit errors, and correlated disk failures. The RAID level 6 architecture (P + Q redundant disk arrays) are shown to be very reliable except for system crashes.

In the experiments, read and write requests are categorized into "small" and "large". It would be better if the definition of "small" and "large" reads and writes can be more specific.


Review 15

This paper gives a survey of disk arrays, which are an important way of improving I/O performance by enabling parallelism between multiple disks. The article talks about disk technology and its performance and reliability, striping across multiple disks to improve performance and redundancy to improve reliability, seven disk array architecture (RAID), and six disk array prototypes.

The disk recording density has increased by a lot because of smaller distances between magnetic read/write head and the disk surface, more accurate positioning electronics, and more advanced magnetic media. Multiple disks handle data evenly to appear as a single fast, large disk through data stripping. With stripping, multiple I/Os can be serviced in parallel on separate disks, shortening the queuing time. Multiple disks acting in coordination allows single multiple-block requests to be serviced. More disks leads to larger potential for performance benefits, but also leads to a decrease in overall reliability of the disk array. Redundancy organizations are distinguished by data interleaving granularity:
1. Fine-grained arrays interleave data in smaller units, allowing all sized I/O requests to access all of the disks in the disk array, and resulting in high data transfer. However, only one logical I/O request can be serviced at a time, and disk re-position for every request, wasting time
2. Coarse-grained disk arrays interleave data in larger units, allowing large requests to access all disks and small requests to access only a small amount of disks. Multiple small requests can thus be serviced in parallel, large requests will thus have higher transfer rates from multiple disks

The basic RAID organizations are:
Level 0: Nonredundant – lowest cost because no redundancy. Best write performance, but not best read performance. Mostly used in supercomputer environments, where performance and capacity are valued over reliability
Level 1: Mirrored – twice as many disks as non-redundant disk array are used. Read is fast, write is slow, and disk fails are backed up by another copy. Used when availability and transaction rate are valued over efficiency.
Level 2: Memory-Style – less cost than Mirrored with Hamming codes, which contain parity for distinct overlapping subsets of components. The amount of redundant disks is proportional to the log of the total number of disks in the system. Increased efficiency with increased number of disks
Level 3: Bit-Interleaved Parity – use single parity disk to recover lost information. Only one request serviced at a time because each read request accesses all data disks and each write request accesses the parity disk in addition to all data disks. Parity disk cannot participate in reads, so slower read performance. Used in applications that use high bandwidth but not high I/O rates.
Level 4: Block-Interleaved Parity – data interleaved in blocks of arbitrary size rather than bits. The parity disk is a bottleneck because it most be updated on all write operations.
Level 5: Block-Interleaved Distributed-Parity – distributes the parity uniformly over all disks. Disk conflicts when servicing large requests decreased because you must access every disk once before accessing a disk twice. A disadvantage is that small write requests are inefficient because of read-modify-write operations.
Level 6: P + Q Redundancy – protects against up to two disk failures using the bare minimum of two redundant disks using the Reed-Solomon codes. Due to the read-modify-write procedure, P+Q redundancy perform small write operations

Overall, RAID Level 5 has the best performance. I enjoyed this paper because it was very clear on the motivation for disk arrays and the different types of RAID organizations.



Review 16

This paper talks about disk arrays, which are ways to use multiple disks to improve performance or reliability. The article describes the different levels of redundant arrays of independent disks (RAID) as well as advanced research topics related to disk array performance and algorithms for consistency. The article addresses the import problem of summarizing this research and describing the direction that it is heading. Survey papers are important because they can effectively introduce people to a field of research and the current problems in that field.

This paper describes the motivation for research in disk arrays which includes the fact that disk access times are not improving as quickly as microprocessor speeds. This means that people can tackle bigger and new sets of problems that will require more advanced storage systems. The redundancy and performance properties seem to have a trade-off in the design of disk arrays. This paper discusses the advantages and disadvantages of performance, reliability, and cost.

This paper is strong in its coverage of past material, serving as an effective survey paper with good explanations of RAID levels, and a comparison of the levels. The tables that describe the reliability and performance of the various options for secondary storage are insightful. Overall this is a well written and effective survey paper. There isn't much to criticize. It describes the past and briefly the future research of secondary storage. I would have liked to hear more about the future research. Section 6 could have been longer, but this seems like a pretty subjective drawback.


Review 17

Part 1: Overview

Disk arrays are important for improving performance and reliability of a database. This paper introduces level 0-6 disk array architectures, called Redundant Arrays of Inexpensive Disks (RAID). All performance measure, advantages and disadvantages of these architectures are discussed and some future opportunities are also mentioned.

Thanks to the success of semiconductor industry, disks are becoming cheap and high volume. Performance of single disk in terms of transfer rate and capacity is growing like snowball. Disk arrays can support two key concepts, data striping for improved performance and redundancy for improved reliability.

To analyze the classified seven RAID architectures, the author define three metrics, reliability, performance, and cost, respectively. Cost is proportional to the total number of the disks in an array. Mean time between failures is used to help analyze reliability.

Part 2: Contributions

This paper divides RAID into seven levels through zero to six: Nonredundant, Mirrored, Memory-Style ECC, Bit-Interleaved Parity, Block-Interleaved Parity, Block-Interleaved Distributed-Parity, and P+Q Redundancy.

Reliability is thoroughly discussed as all possible types of failure are included. System crashes, uncorrectable bit errors, and correlated disk failures are taken into account and finally shows that P+Q redundant disk arrays can effectively protect data from double disk failures as well as uncorrectable bit errors.

Implementation concerns are also introduced in the performance comparison section. Disk arrays need more metadata to store failure information. Also multiple disks are connected to the host computer through strings or buses, disks may become inaccessible after a system crash.

Part 3: Possible ways to improve

More details could be provided in the correlated disk failures section. There is not enough visualized view to help readers understand increased the failure rate caused by the correlation between disk failures.

Fixed equivalent file capacity is assumed when doing the comparison between the seven architectures. However, one may ask the question that what if the file capacity is not used for the fixed metric and maybe the files lies in different sizes. For different situations can we still got the comparison results like we have now?

In the reliability analysis, the data sets used for simulation based evaluations and data source for doing the comparison are not explicitly clarified. For example, for the RAID level 5 failure characteristics table, one may want to know the experiment settings.



Review 18

The paper provides an overview of Redundant Arrays of Inexpensive Disks (RAID). RAID is different ways of organizing multiple disks in order to improve performance and reliability of storage system. The paper motivates the need of having RAID as a solution to keep up with the large improvement in microprocessor speed and the need of a higher performance secondary storage systems by image-intensive applications. The paper describes seven levels of RAID which employ different data-striping and redundancy schemes. Data striping distributes data over multiple disks allowing multiple I/Os to be serviced in parallel. And redundancy allows to tolerate disk failures which become apparent as the number of disks increases. The different levels differ in terms of how these schemes are implemented. For example, RAID level 0 is primarily used to improve performance by striping data without any redundancy, whereas RAID level 1 uses twice as many disks, fully duplicating data across multiple disks to improve reliability. RAID level 2 uses error correcting code like hamming code to correct a single bit error on the fly. Whereas the other RAID levels employe some kind of parity either bit-interleaved or block-interleaved and on a single parity disk or distributed among the data disks. RAID level 3 has a bit-interleaved single parity disk; RAID level 4 has block-interleaved single parity disk; and RAID level 5 has a distributed block-interleaved parity disks. RAID level 6 is similar with RAID level 5 except that it uses two disks instead of one for parity. RAID level 6 uses Reed-Solomon codes to protect against up to two disk failures.
The paper highlight comparison of various RAID levels in terms of performance, cost and reliability. For example, assuming equivalent file capacity, RAID 1 can sustain half the number of small writes per second that a RAID level-0 system can sustain which is apparent as RAID level-0 provides highest write performance at no reliability. The reliability of disk arrays can be significantly reduced because of factor including system crashes and unrecoverable bit errors. The paper, for example, discussed that RAID level-6 is more reliable than the others in various failure characteristics including double disk failure considering the same file capacity.
The strength of the paper lies in its comprehensive coverage of RAID including comparison with important metrics such as performance, reliability, and cost. It also clearly presented a strong motivation to having RAID by mentioning the widening gap between microprocessor speed and disk access speed.
The paper was written when the semiconductor industry was at its infant stage. It is surprising to see most of the motivation behind RAID at that time doesn’t hold now. It is important to recognize that we are at the end of Moore’s law instead of the beginning and as a result the speed of microprocessor is stagnating rather than exponentially increasing. So the main motivation of coping up with the increasing microprocessor speed doesn’t hold any more. While it is still important to increase the disk read/write performance, the performance increment it bring in the overall execution time of the application might be suboptimal. It is important to investigate other design approach like near data processing in which computation is taken to where data is rather than trying to take data to processor at a faster speed. This essentially involves in embedding microprocessor in a data storage system. Also since the paper was written in a time where semiconductor based memory was expensive and small, it would be interesting to evaluate the importance of RAID in face of cheaper and larger flash based memory which allows in-memory database. The striping and redundancy schemes applicable to RAID is completely different from those of schemes required for semiconductor based memory.



The paper provides an overview of Redundant Arrays of Inexpensive Disks (RAID). RAID is different ways of organizing multiple disks in order to improve performance and reliability of storage system. The paper motivates the need of having RAID as a solution to keep up with the large improvement in microprocessor speed and the need of a higher performance secondary storage systems by image-intensive applications. The paper describes seven levels of RAID which employ different data-striping and redundancy schemes. Data striping distributes data over multiple disks allowing multiple I/Os to be serviced in parallel. And redundancy allows to tolerate disk failures which become apparent as the number of disks increases. The different levels differ in terms of how these schemes are implemented. For example, RAID level 0 is primarily used to improve performance by striping data without any redundancy, whereas RAID level 1 uses twice as many disks, fully duplicating data across multiple disks to improve reliability. RAID level 2 uses error correcting code like hamming code to correct a single bit error on the fly. Whereas the other RAID levels employe some kind of parity either bit-interleaved or block-interleaved and on a single parity disk or distributed among the data disks. RAID level 3 has a bit-interleaved single parity disk; RAID level 4 has block-interleaved single parity disk; and RAID level 5 has a distributed block-interleaved parity disks. RAID level 6 is similar with RAID level 5 except that it uses two disks instead of one for parity. RAID level 6 uses Reed-Solomon codes to protect against up to two disk failures.

The paper highlight comparison of various RAID levels in terms of performance, cost and reliability. For example, assuming equivalent file capacity, RAID 1 can sustain half the number of small writes per second that a RAID level-0 system can sustain which is apparent as RAID level-0 provides highest write performance at no reliability. The reliability of disk arrays can be significantly reduced because of factor including system crashes and unrecoverable bit errors. The paper, for example, discussed that RAID level-6 is more reliable than the others in various failure characteristics including double disk failure considering the same file capacity.

The strength of the paper lies in its comprehensive coverage of RAID including comparison with important metrics such as performance, reliability, and cost. It also clearly presented a strong motivation to having RAID by mentioning the widening gap between microprocessor speed and disk access speed.

The paper was written when the semiconductor industry was at its infant stage. It is surprising to see most of the motivation behind RAID at that time doesn’t hold now. It is important to recognize that we are at the end of Moore’s law instead of the beginning and as a result the speed of microprocessor is stagnating rather than exponentially increasing. So the main motivation of coping up with the increasing microprocessor speed doesn’t hold any more. While it is still important to increase the disk read/write performance, the performance increment it bring in the overall execution time of the application might be suboptimal. It is important to investigate other design approach like near data processing in which computation is taken to where data is rather than trying to take data to processor at a faster speed. This essentially involves in embedding microprocessor in a data storage system. Also since the paper was written in a time where semiconductor based memory was expensive and small, it would be interesting to evaluate the importance of RAID in face of cheaper and larger flash based memory which allows in-memory database. The striping and redundancy schemes applicable to RAID is completely different from those of schemes required for semiconductor based memory.


Review 19

This paper provides a comprehensive overview of disk mechanism and gives a framework for current and future work. First, disk technology that improves performance and reliability is introduced. Second, RAID architecture in disk to improve throughput is introduced along with the algorithm to ensure consistency.

It starts with basic disk terminology, or disk components, as background information. It also provides some disk properties like data density and transfer rate. It soon dives into discussion about RAID. The system introduces data redundancy to improve data availability and failure tolerance. This paper contains detailed evaluation of the system in aspects of cost vs. performance, reliability and capability in error correction and handling disk failure for block-interleaved redundant disk arrays. RAID significantly reduced data loss probability.

As we may have noticed, RAID technology has been widely used commercially since late 1980's. It is truly a compelling technology to protect data and improve reliability. On the other hand, as other massive and faster new storage technology like solid-state disks raises, RAID shall be improved to fit into those new storage media. The key idea maintain the same though, which is why this paper is so remarkable.


Review 20

This paper examines all layers of RAID technologies, and compares them using performance, reliability and cost as criteria.
The layers discussed are: Non-Redundant, Mirrored, Memory Style ECC, Bit Interleaved Parity, Block Interleaved Parity, Block Interleaved Distributed Parity, and P+Q Redundancy.
For performance and cost, this paper concludes that using equivalent file capacity and throughput per dollar will be a good way to compare these technologies. Different application might have best performance on different layer depending on R/W size.
For reliability, first the MTTF of RAID is discussed. MTTF can increase from 2000 hour dramatically to 38 million years using P+Q Redundancy layer when only single disk failure is considered. But this number is decreased when taking other factors into account, such as correlation between disk failures, system crashes and uncorrectable bit errors.

Contributions:
It made a good discussion about reliability problems faced by RAID other than single disk failure. Through presenting these reliability problems: 1) double disk failure, 2) system crashes followed by a disk failure and 3) disk failure followed by uncorrectable bit during reconstruction, this paper showed that we can not use MTTF to measure reliability of a RAID and the reliability of RAID is not perfect. Instead we need to solve those problems to improve reliability.
Another contribution it made is that it gives a clear picture of what to research on next. Parallel computing and large server indeed become new trend. Solving problems in RAID helped those imaginations to become true.

Weakness:
In the implementation part, this paper only discussed two structures of the controller and disk. May be there could be more controllers. Combined with DMA, more controllers might change performances of each layer of RAID discussed in the paper. But the paper didn't mention it.



Review 21

This paper puts forth the idea of RAID or Redundant Array of inexpensive disks. RAID was introduced in order to improve performance using striping across multiple disks and redundancy to improve availability.

There are total 7 RAID systems that this paper introduces in varying order of number of redundant disks and mechanisms and comparisons with respect to cost, availability and the ability to recover from a failure.

One of the major issues with RAID is its ability to recover from a system crash. Even though it seems to be highly reliable (the mean time to failure for RAID 6 with P+Q redundancy is specified as 38 million years), a system crash during a write can result in the loss of correct parity of stripes. The authors do suggest a solution where information sufficient to restore parity must be written to non-volatile disks just until write is complete. This paper has given a lot of theoretical estimates with regards to reliability and I don’t think that lets you understand how the real world implementation handles the different situations with regards to unreliability. Graphs/numbers depicting those would have helped this paper be more understandable in terms of implementation requirements.

Uncorrectable bit errors are specified to be the errors that are generated because data was incorrectly written or due to the damage of the magnetic media with time on which it is stored. Even though the explanations have specified that the effect of these errors depends on the interpretation, a few experiments after introducing a few bit errors as a part of it, might have made this easier to visualize.

I really like the idea of orthogonal RAID where error correction groups are organized orthogonally so that even if a string fails, there is access to all data. This paper definitely has introduced something new and path breaking for its time introducing the idea of highly available storage sytems and I believe this gave way for a lot more complex storage systems as the authors have intended to do. What was interesting to find out is that many operating systems even today give the user the option to implement RAID[1].


[1] https://en.wikipedia.org/wiki/RAID



Review 22

This paper provides an overall description of the Disk array technology as an high-performance, reliable secondary storage. It reveals the fundamental reason of the creation of RAID as the increasing performance gap between the processor and magnetic disk. Striping data across the disk array and using redundant disk are the solution to assure the high performance and reliability.

And in total, basic RAID organization are categorized from level 0 to level 6. which are, nonredundant(0), mirrored(1), Memory style ECC(2), bit-interleaved parity(3), block-interleaved parity(4), block-interleaved distributed-parity(5) and P+Q redundancy(6). Among all of the above designs of RAID, only the mirrored design is using a doubled number of disks for keeping replications, which in a way provides better small transaction rate and availability. level 2 to level 5 all uses the parity to for integrity checking and failure recovery, but to different levels. level 6 design mainly considers the failure recovery of two consecutive disk failures, and utilize the reed-solomon codes to ensure that.

Because of the different redundant method used to achieve reliability, RAID levels 0 - 6 shows a wide range of tradeoffs among the metrics of reliability, performance and cost. But it turns out with better reliability, the block-interleaved distributed-parity and P+Q redundancy design achieve high throughput, but can still be limited by the parity group size.

However, in real world implementation, the software system crash, uncorrelated bit errors and correlated disk failures can still bring down the ‘mean time to data loss’, even for the level 5&6 RAID systems.

One of the major predictions of in this paper are becoming the nowaday trend of embedding disk array into distributed parallel computers, and using its interconnection network to manage the data distribution and redundancy maintenance. Industry giants like google, amazon are all building such kind of systems to provide universal infras for their services and they even sell that as part of the fundamental resources to many startups.





Review 23

As semiconductor technology improves very fast which make faster microprocessors and larger memory, larger and higher performance disk is required. Then research about Redundant Arrays of Inexpensive Disks become popular. As Disk arrays stripe data across multiple disks to provide higher performance and need redundancy to tolerate disk failure, different disk array organizations are developed. In this paper the author describe seven basic disk array organizations and analysis the performance and advantage/disadvantage of them.

First, the author introduce the data striping, this improves the performance of the disk as multiple independent request can be serviced at the same time and single request can be serviced by multiple disk. But this introduce a problem about higher risk of disk failure, so redundant is necessary.
(1) Nonredundant: No redundant data, worst reliability, best write performance, wildly used in supercomputing environments.
(2) Mirrored: Have two copies of data. Good availability and transaction rate.
(3) Memory-Style ECC: Using Hamming code to provide recovery but less cost than mirrored.
(4) Bit-Interleaved Parity: data is interleaved bit-wise and a single parity disk is added to tolerate any single disk failure.
(5) Block-Interleaved Parity: similar to bit-interleaved parity but the size is one block.
(5) Block-Interleaved distributed-Parity: distribute the parity uniformly over all the disks.
(6) P+Q redundancy: protect up to two disk failure.
Then the author talks about how to compare RAID organizations based on performance and cost and reliability. And also the author introduce some problems when implementation like using metastate informations to deal with system crash, disk failure.

The author concentrate on the comparing different RAID organizations and trade-ff among cost, performance and reliability. As all of these strategy has its own advantages, I would like the author the introduce some discrete examples of which company or organizations use each of these Raid.



Review 24

The authors presents an overview of the state of using disk arrays for storing data. The background details the contemporary developments in computational and storage technology. The authors describe RAID, a system to use a disk array to store data that has several modes of operation. These have varying degrees of redundancy that use different forms of redundancy ranging from no redundancy, to block-level replication, to usage of Hamming codes, to Reed-Solomon codes. These modes make different tradeoffs with regards to performance and storage overhead.

The authors discuss the performance and failure characteristics of the different RAID organizations of equal size. The authors then discuss failures due to both system crashes and bit errors before moving onto discussing reliability in the face the correlated failures. The authors assert that the existence of correlated failures substantially decreases the mean time to failure. System administrators should be aware of this when deciding the RAID organization that they use. The authors conclude with a discussion of future research, focusing on interaction of RAID with other technologies that are layered on top, improving parallelism through the use of many small disks, and decreasing disk latency.

This paper presents an overview of the contemporary state of technologies for storing data on disk arrays with a focus on RAID. The authors present relevant background material and detail the various RAID organizations which examining the performance and reliability characteristics of the different organizations.

This paper does not discuss the decisions for including the various RAID organizations that exist. In fact, it discusses the overlap and confusion of the various modes. While it does discuss performance differences of the organizations, it doesn't justify the existence of all of the modes (one could create new modes with different characteristics). Use cases should exist for all of the organizations to prevent the creation of solutions to non-existent problems and the complexity the comes with that.


Review 25

This paper introduces a technology called redundant array of inexpensive disks or RAID, which uses multiple disks and data storage virtualization to improve performance and reliability. With processors improving at a faster rate than data storage devices, the bottleneck with computational systems are with the physical storage disks. Therefore, the industry needed a solution to keep up with the ever increasing processing speeds. RAIDs help solve this problem and also allow for more redundancy to decrease the chance of losing data during any sort of failure.

For comparing the performance of the different types of RAIDs, the paper does a good job of considering every variable that might make the comparison biased. The assumptions made about how RAID will be used were taken into account and the final method of comparison is I/Os per second per cost.

Even though the RAID levels were explained thoroughly and the comparisons among the different types of RAID were done well, there are still parts of the paper that I felt like could have been expanded and explained more.

1. The paper mentioned hardware and software implementations in the context of disk failures and recovery. It did not give insight on how they are implemented and what the performance differences are between implementing them in software and hardware. I would have liked to see charts comparing the performance and the cost differences of the same type of RAID, but with software and hardware implementations.

2. When a disk failure occurs with each type of RAID, certain read and write operations need to take place to make sure that the data is correct and consistent. Also, when the RAID recovers from a crash or failure of some sort, data needs to be revalidated. The paper does not experiment with and compare the time needed to do both of these tasks.

3. The concept of RAID can be applied to a single hard drive that partitions. This proposal would not need overarching logic for the whole RAID, but just one for each disk. What would be the performance and implementation benefits and weaknesses of doing so?



Review 26

This paper discusses the merits of Redundant Arrays of Inexpensive Disks (RAID). At the time this paper was written, microprocessor performance was increasing by roughly 50% per year, while disk access times and disk transfer rates were increasing by only 10-20% per year. Much of the motivation for RAID was increasing the transfer rate of data from secondary storage in order to help bridge the growing gap between processor performance and disk performance. RAID also helped to increase the reliability of data stored on disks.

RAID accomplishes these goals by using arrays of disks and distributing multiple copies of data among these disks. This provides reliability by ensuring that information can be recovered when disks fail. Parity information for all data is stored on dedicated disks or on blocks spread across all disks in order to ensure that data can be recovered after disk failures. RAID increases data transfer rate by reading from and writing to multiple disks simultaneously in order to fulfill I/O requests more quickly.

The paper does an excellent job of detailing the differences in the various RAID configurations and the benefits and tradeoffs of each. There are always tradeoffs between reliability and performance in RAID setups. For example, RAID 0 gives the highest performance for reads and writes of all sizes; however, it retains no parity information about its data, so there is no way to recover information lost when disks fail. Other RAID levels allow administrators to set stripe size, number of disks in a string, size of parity groups, etc. in order obtain the best performance balance based on their expected workload (small reads, large writes, etc.).

I would have liked the paper to give a bit more information about the consequences of data loss. The authors devoted several pages to breaking down numbers related to disk failure, uncorrectable bit failures, and reliability statistics, but they never really explain what they define as data loss. A few bytes of lost or corrupted data is significantly different than a few disks of lost data, so it would have been nice for the authors to clarify exactly what they meant by data loss.


Review 27

The paper gives a comprehensive overview of disk arrays and provides a framework in which to organize current work. The paper introduces seven disk array architectures called RAID (Redundant Arrays of Inexpensive Disks) and compares their performance.

First, the paper talks about some basic ideas about the disk array, and then starts to talk about seven disk array structures from level 0-6. It provides some approaches and many experimental results to compare their performance.

Second, the paper discusses some advanced topics, including improving small write performance for RAID Level 5, declustered parity, and exploiting on-line spare disks.



Review 28

This paper summarizes research into RAID technology and creating an array of hard drive disks rather than a single disk. The primary motivation for this was faster disk access when large amounts of data must be stored by enabling parallelism in reads and writes because it is not all on the same disk. It does a good job describing the 7 levels or RAID and summarizing their respective downsides and upsides. It did a good job of providing proper background on disks and their issues before delving into RAID technology and how it is solves them.

One thing that stuck out to me about this paper was the final section about “Opportunities for future research”. This is something that I think is a phenomenal idea to include in papers as it provides a call to action of sorts to people who might be interested but not know how/what to get involved in. It is the author’s way of saying “these were ideas we thought of but didn’t have time to explore, but feel free to help us out!” I think it is a great way to promote future growth in the field.

I also think the paper did a good job of explaining the different levels of RAID from least complex to most complex and compared them as they went. I found myself wondering while reading “why would anyone ever do level x over level y” and then as soon as I thought that it was addressed in the paper as to what the downsides of the level were and how it compares to others.

The last main positive I will touch on was I believe the paper did a good job of focusing on the proper metrics to evaluate performance. I think cost, performance, and reliability are the 3 major things that should be evaluated and they touched on all three.

One small complaint I have is for the explanations of RAID levels 0-6 It would have been nice if the graphics for each level of RAID were shown next to their explanations. I know this would make the paper longer in length but when I was reading it at the beginning of each new level being explained I had to scroll back up to the image of all the RAID types side by side. It would have been nice to eliminate that scrolling, but maybe something like that is frowned upon the paper writing community.

Another small complaint I have was “throughput per dollar” being used as the metric for the performance evaluation. I found this to be confusing because the graph made it appear that RAID level 0 provided the highest throughput per dollar and thus making it look the best. However, that was the most simplistic and basic version of RAID and it required more disks than the others for the same amount of info. It felt to me like the worst version of RAID but the graphs made it look like the best in my opinion and that felt misleading.

All in all I think it was a solid paper and I’m proud to go to the same school as P Chen and have taken his class :-).



Review 29

The authors of this paper introduce a new structure for disk storage technology (redundant arrays of independent disks, aka RAID). The motivation behind this paper was to provide a new system for a storage system that was both reliable and efficient, while still maintaining large amounts of storage space. In order to coordinate with the ever-growing power of processors and memory in new computers, a fast storage system is needed as well.

The cleverness behind RAID begins with data striping, a method of splitting sequential data across multiple disks, enabling such data (i.e. block data requests) to be retrieved in parallel. However, each disk also runs its own independent risk of failure, requiring a second measure to compensate: redundancy. Interleaving data with multiple disks has different strengths and weaknesses, depending on how granular the data is split:

(1) Fine-grained interleaving: Small pieces of data are stored across many disks. Therefore more disks can be accessed concurrently, but access is blocked during these times.

(2) Coarse-grained interleaving: Larger pieces of data are stored across fewer disks, allowing more concurrent operations, but data retrieval may be slower.

The complexity of the RAID system’s possibilities appears to be one of its main drawbacks; the choice for data redundancy and data striping schemes is highly application-dependent and requires thorough analysis. The paper does a good job of describing the varieties of the RAID scheme, even from a (somewhat) layman’s perspective, but is lacking in some components. For example, the “reliability” analysis seems purely theoretical; it would be more convincing if there were some metric (i.e. actually trying to induce and count failures) shown for reliability comparing RAID/non-RAID schemes. Further, there is little discussion of other competing technologies, which may have also contributed to an interesting study between different disk designs.


Review 30

The requirements of storage systems vary from application to application, some require high throughput while others require high reliability guarantees. Through comprehensive review of the various RAID levels and general analysis of reliable storage systems, this paper helps identify which storage techniques are best suite for different applications. The authors primarily focus on three metrics when accessing the different RAID levels; reliability, performance and cost.

This paper does a good job at explaining the different rationales in comparison techniques. It explains that since most secondary storage systems are throughput oriented, it is sensical to normalize I/Os per second per dollar. It also makes a point to compare systems with the same file capacity so that performance results are more easily interpreted. Another strength of this paper was in its ability to rationalize theoretical guarantees. Specifically, when discussing reliability, the authors introduce the notion of uncorrectable bit errors such as incorrect writes due to damaged magnetic media. Such bit errors nullify some of the theoretical guarantees on mean time between failures for RAID levels.

A major draw back of this paper was its lack of explanation regarding testing methods. Practically no information was provided for the hardware, test data or actual tests that were run to gather the data presented. Without adequate information about the testing environment, the validity of the results presented remain in question.


Review 31

The paper talks about the implementation of disk arrays – specifically RAID (Redundant Arrays of Inexpensive Disks) – as an alternative to design and use the secondary storage system. Two techniques that are used in disk arrays are data stripping and redundancy. Data stripping distributes data across multiple disks and access them in parallel to achieve both higher data transfer on large data access and higher I/O rates on small data accesses. Since it requires large disk arrays such that it is vulnerable to disk failure, the obvious solution is to employ redundancy to tolerate disk failure and allow continuous operation without data loss.

However, there are two problems with redundancy: (1) selecting method for computing the redundant information and (2) selecting method for distributing the redundant information across the disk array. This paper explains 7 RAID architectures: Non-redundant (level 0), Mirrored (level 1), Memory-Style ECC (level 2), Bit-Interleaved Parity (level 3), Block-Interleaved Parity (level 4), Block-Interleaved Distributed-Parity (level 5), and P+Q Redundancy (level 6). Further, cost and performance comparison between architectures is discussed, as well as reliability aspect. Focusing on level-5 and level-6 RAID, the paper also compares the reliability of the architecture over data-loss risk that usually caused by either double disk failures ,system crash followed by disk failure, or disk failure followed by uncorrectable bit error. From the comparison, the paper seems to favor the level-6 RAID (P+Q distribution) as the better implementation. Lastly, several implementation considerations are discussed (avoiding stale data, regenerating parity after system crash, operating with a failed disk, and orthogonal RAID).

When the paper was written, disk arrays are implemented in many commercial products but there were still many unresolved practical issues. One apparent contribution of this paper is that it gives a comprehensive overview of disk arrays and provides a framework in which to organize current and future work. Doing the comparison between levels makes it easier to understand the pros and cons of implementing each RAID standard (level), and which configuration (data stripping unit, parity group) best suits which RAID standard.

However, I feel that while the paper mention a lot about parity as “back-up” data and how to detect/prevent/resolve the issue of parity inconsistencies, it does not mention a lot about how to detect the integrity of the parity itself, especially in level-5 and level-6 RAID, where parity is distributed in the same disks along with the data. And since it is placed in the same disk physically, if that disk experiences disk failure, would there be a way to let us know whether it affects the parity as well or not?



Review 32

The purpose of this paper is to provide readers with an overview of the different kinds of systems that are used in RAID (Redundant Arrays of Inexpensive Disks). This is another survey paper that is geared towards presenting just enough information that a user could make a decision about which of the different RAID levels might be right for their applications and needs.

The technical contributions of this paper come mostly in the empirical evaluations they conduct. However, I do believe there are several big weaknesses in this area of the paper as well (see below). They also introduce RAID-0 and RAID-6, which they state have not really been examined in previous survey works.

The weaknesses of this paper are many. First, I wish the authors had presented example applications of all of the different RAID levels as they worked through their descriptions. Though they do give examples of specific uses of RAID Level 0 and 1, where the latter is Database systems, I was disappointed that the later sections did not follow a similar format and end with possible real-world applications. However, I think the main weakness of hte paper is in the strength of the empirical results. The authors themselves state that it is difficult to evaluate the different RAID levels empirically due to the fact that RAID levels are typically used to specify a certain configuration of the system, but they can also be used to specify a specific implementation of the system. It turns out that specific RAID levels (especially level 3 vs level 5) can become almost the same given specific implementations. Additionally, the authors present their metrics normalized by cost. While I understand the motivation behind this decision for comparison between the different levels, I find that this normalization makes the interpretation of the different measurement values very counter-intuitive. Throughout the paper I thought that there was some abuse of notation (the letter G stands for different quantities in different formulas), and insufficient discussion of the results.

At the end of the paper, the authors do attempt to account for some of the shortcomings, but I think that there are too many to overcome. The ambiguity in the specification isn’t doing anyone any favors, and I have a hard time taking the results seriously when they could vary with vastly different implementations of the different RAID level configurations.


Review 33

Title: RAID: High-Performance, Reliable Secondary Storage
This paper presents a tutorial and survey of disk arrays and the idea of striping. Though lack of novelty, it provides a systematic manual for studying the large amount of options of disk arrays. In general, the many available configurations of disk arrays are in a gist the results of making tradeoffs among the performance, reliability and costs. So the contribution of this paper is to help users and designers to make the optimal choice of disk arrays according to individual needs.

While providing detailed information about design choices and options available at the time, it would be more interesting and illustrative if real world examples had been provide. For instances, in section 3.5.3 the author mentions two different options of operating with disk failures and compared the trade off between the two. While the description is detailed and clear, it would be more helpful to provide information about what real case situations would make each of the two options preferable. In another word, in what situation makes it more of a choice to avoid using a stand-by spare disk even with the cost of requiring additional metastate information.

An interesting point to think of is that, as a very old paper from more than 20 years ago, many of the considerations and limitations considered in this paper have already been overcome by the development of technologies such as SSD or faster accessible memories or large capacity memories. With these modern advantages available, which of the old-time limitations can be skipped in this paper would be an interesting topic to discuss.

It is also interesting to see that, as pointed out in the paper in section 6.1, when a research field is so commercial-related, the amount of published work and the amount of research resources devoted into the field are not proportional as in other fields that are more theory-oriented.



Review 34

This paper introduces RAID, Redundant array of Inexpensive Disk, which is used to fill the gap between the fast development of microprocessor and the slow increase of data transfer rate of second storage system. RAID splits the data into an array of inexpensive disk to parallelize the data transform and use redundant data to solve the problem of more common failures. By far RAID has several different models, including:
1. Non-redundant RAID (level 0): It provides the best data transfer-rate but worst fault-tolerance.
2. Mirrored RAID (level 1): It uses another array of disk to store the data repeatedly to provide more reliability. However, it has more redundant data: It stores a whole data copy.
3. Memory - style ECC RAID (Level 2): It uses Hamming codes to provide fault tolerance with less storage.
4. Bit Interleaved Parity RAID (Level 3): It use bit-interleaved parity to provide fault-tolerance. Data stores in level 3 RAID are interleaved bit-wise over the data disks.
5. Block Interleaved Parity RAID (Level 4): It uses block-interleaved Parity to provide fault-tolerance.
6. Block Interleaved Distributed Parity RAID (Level 5): It uses Block-Interleaved Distributed Parity to store the redundant data and provide fault-tolerance. However, different than Level 4 and 3, it stores redundant data uniformly over the disks so it decrease the impact of bottleneck of parity techniques.
7. P+Q Redundancy (Level 6): It uses Reed-Solomon codes to provide better fault-tolerance than previous levels using the bare minimum of two redundant disks.

This paper also present a comprehensive comparison between RAID levels to provide the information of performance and cost with these models. Moreover, the paper also discusses the problems that can be solved by RAID as well as the tradeoff and the future development of RAID.

Strength:
1. This paper summarizes RAID models in details and provides the background and application of RAID, which helps its reader to understand the importance of RAID.
2. This paper has a good structure that first gives it readers a good understanding of background and then summarizes the models, and then go deeper into the advance topics. At last it introduces the future developments.

Weakness:
1. Though this paper gives a comprehensive introduction of different RAID models, it would be more interesting to see the combination use of these models and the pros and cons of different selection of combination.



Review 35

This paper discussed Redundant Arrays of Inexpensive Disks(RAID), a high-performance and reliable disk storage technique.

The author first introduced the problem in single disk storage systems, mainly limited performance and reliability issues, then compared the 7 different ways of organizing disk arrays and evaluated their cost, performance and reliability.

There are 7 different RAIDs, namely RAID level 0 through 6. RAID 0 is of non redundant disk, the most simple way of linking disks into an array. As it doesn’t use redundant disks, the performance/cost is highest, but provides poorest reliability. RAID 1 is to mirror disks, creating a shadow copy for each disk. Read requests are fully parallelized, but every write request need to go to both disks. Reliability is achieved by serving back up disk. However, storage efficiency is quite low(1/2). RAID 2 uses memory style parity for detecting failures and recovering data. The parity bits are stored in extra disks. RAID 3 ~ 5 are also using parity bits. Rather than RAID 2 using a set of disks to recover lost information, they only use a single parity disk, as disk controllers can easily identify which disk failed. RAID 3 ~ 5 differs from each other in subtle ways. RAID 3 is bit interleaved, while RAID 4 and 5 are block-interleaved. RAID 4 stores all parity bits in a single disk, while RAID 5 distributes them among all disks. The last one RAID 6 is a more general way of using parity. Due to the limitation of parity, RAID 2~5 are only capable of correcting single self-identifying failure. Moreover, it requires successfully reading all other disks when a failure comes up, which is not that easy to achieve. RAID 6 uses Reed-Solomon codes, protects at most 2 disk failures by introducing more overhead. Reliability wise, the paper mostly compares RAID 5 and 6. Double disk failures, system crash and uncorrectable bit error can all compromise the reliability of disk arrays. The author shows that each problem has significant probability happening on RAID 5 during 10 year period. RAID 6 is much more effective in protecting against double disk failures and unrecoverable bit, but still susceptible to system crashes.

Lastly the paper talked about a few advanced techniques and implementation details, as well as future work.