The entire enterprise IT community is storming with tremendous energy into the world of Cloud Computing and Virtualization, and for good reason.  These technologies address head-on the pressing business needs to provide more computing power and increased flexibility for less money.  However, those business benefits are balanced by the risk that comes from having a data center that’s heavily influenced by unpredictable factors outside of the normal controls and patterns that IT professionals are used to experiencing.

In a 2010 study by ISACA, 45% of IT professionals believe that the risks of Cloud Computing outweigh the benefits, and only 10% of IT professionals are willing to place mission critical applications within the Cloud.  Risk concerns include the location and security of sensitive data and software, but concern is quickly growing over issues of availability, reliability, and serviceability.

cloud1Availability vulnerabilities were amplified earlier this year during the well publicized outage experienced by Amazon when on June 29th, when their data center outside of Washington DC crashed for hours.  The impact was felt by hundreds of Amazon Cloud customers, from small e-businesses to Netflix, resulting in lost revenue and lost confidence.

The increase of the relative size of a Cloud data center also is a risk factor.  In most cases, a Cloud data center is the combination of data center consolidation and requirements for growth in computing power.  Many Cloud data centers are becoming mega data centers with thousands of racks.  This increases the operational demands for such tasks as locating a failing system.  In fact, through the Digital Reality Trust Survey in 2011, we know that only 22% of large data centers have the process and controls to find a specific system within several minutes.  72% of large data centers typically require more than 4 hours to locate a system, with a full 20% requiring multiple days.  Responsiveness to maintenance and repair requires the IT infrastructure to do real time inventory and location asset management.

Dependability within the Cloud is an obvious requirement, yet by the very nature of Cloud Computing, there are significant risks to the soundness of the computing environment.  The Committee of Sponsoring Organizations of the Treadway Commission (COSO) lists “Reliability and Performance Issues” as one of the top five risk factors of Cloud computers in their July 2012 white paper titled “Enterprise Risk Management in Cloud Computing.”  COSO identifies Cloud as posing a unique challenge based on the constant opportunity for a user or tenant to place an unexpected resource demand on the cloud infrastructure.

So, where does this risk come from?  There are two principle drivers: randomness of resource requests, and the relative growth in size of cloud data centers.

The risk resulting from the randomness of resource requests is generated by a significant shift in the operational processes of a data center.  For years, IT professionals have operated their data centers with a clear understanding of usage pattern, understanding when peaks and lulls existed, and planning operations around the predictable pattern.  In a true cloud environment, there is no pattern.  The users of the cloud apply resource requests as they need them, and to a data center with many Cloud based applications, they appear random.  As new resources are enabled on demand, there are variable demands on physical characteristics like power supply and cooling.  Lacking the appropriate knowledge and control over power and cooling risks the reliability and availability of the cloud.  An effective cloud data center senses the need for these physical characteristics, and adjusts in real time.  But to do so, they need the infrastructure to properly monitor and analyze the environment.

