scc: Informed Provisioning of Storage for Cluster
Applications
Harsha V. Madhyastha, John C. McCullough, George Porter, Rishi Kapoor Stefan Savage, Alex C. Snoeren, Amin Vahdat |
Identifying an appropriate cluster architecture to host a large-scale service is often not straightforward. Given a set of resources to choose from (e.g., as shown in the adjacent table), an application provider has to answer several questions. What storage technologies should be employed, and how should data be partitioned across them? Where should caching be employed? What types of servers should be chosen to house the selected storage units? In addition, even if the application's implementation is efficient and there is coarse-grained parallelism in the underlying workload, how will algorithmic shifts in the application or variations in workload affect the appropriate cluster architecture? Our goal is to automate the process of answering these questions, rather than relying solely on human judgment. |
|
In developing scc, we show how to systematically exploit storage diversity, i.e, select among different physical media, local and remote storage, and various caching strategies. As shown in the adjacent figure, scc takes three inputs: i) a model of application behavior, specified in part by the application's developer and in part by the administrator deploying the application, ii) characteristics of available hardware building blocks specified by the infrastructure provider, and iii) application performance metrics, i.e., a parameterized service level agreement (SLA) (e.g., a webservice SLA might specify a peak query rate per second). Given these inputs, scc computes how cluster cost varies as a function of the SLA and outputs a low-cost cluster configuration that meets the SLA at each point in the space. scc's output cost vs. SLA value distribution helps administrators decide what performance can be supported cost effectively. |