IT Starts with Information, Part 1: Data Center Monitoring Ensures Availability, Lowers OpEx

Posted by RF Code

Downtime avoidance and disaster prevention are critical aspects of data center efficiency. Most operators use a multi-pronged approach to address these issues, incorporating redundant systems in their design, applying best practices for operational and maintenance procedures, and using innovative and integrative technologies such as data center infrastructure management (DCIM) systems to improve reliability.

Our recent white paper focuses on these concepts and explains how real-time monitoring, capacity planning and predictive analysis technologies help businesses improve data center agility and efficiency to ensure higher performance at a lower cost. In this first of a multi-part blog series, we take a look at the position real-time monitoring has in the data center management chain.

Data centers are the engines of commerce. They enable every aspect of modern civilization, from social connections to global markets, yet for something so ubiquitous, data centers are strangely invisible and poorly understood.

Most enterprise data centers spend millions of dollars on electricity each year. That’s real money in any economy, much less in today’s highly competitive global landscape. Add in rising energy costs throughout much of the industrialized world and the increased potential for climate change-related regulation, and one would think most companies would consider improved energy efficiency their highest priority.

A recent survey by the Data Center Users’ Group found differently. Respondents ranked energy efficiency fourth in priority. The top concerns? Adequate monitoring and data center management capabilities, availability, and change management.

The common denominator of the above is ensuring availability. A seemingly minor change or miscalculation can have massive implications and the prospect of equipment failure is the type of concern that keeps executives up at night.

The data center cannot go offline, not least because of the expense. A 2013 Ponemon Institute survey reported an average cost of $690,204 per incident and that only includes readily quantifiable impacts – business disruption, lost revenue, decreased productivity, equipment repair and the like. The cost to a business’s reputation is harder to measure, but it lasts longer and affects the bottom line far more than the expense of the actual event.

Understanding the Concepts at Hand

resl time monitoringThere are many techniques and technologies data center operators can employ to save energy in their facilities. Recent guidance from ASHRAE shows that today, data centers, which used to operate at 55-65 ̊F, can run at 80 or even 90 ̊F, and with less stringent humidity limits. The impact of these changes can be significant: for every 1 ̊F increase in server inlet air temperature, 2-5% in energy costs can be saved.

Given the substantial savings on offer, one might expect every business to make these seemingly straightforward adjustments. Yet many do not; over three-quarters of the respondents to the 2013 Uptime Institute survey reported that their average server supply air temperature was 65-75 ̊F – far cooler than ASHRAE recommends.

Power distribution and backup equipment also contributes to energy waste in the data center, due to conversion losses, poorly designed power chains and inefficient power supplies and cables. However as with cooling, there are many strategies that can help improve power efficiency, the most obvious of which is from a compute perspective.

Because most data centers provision for peak load – loads that may occur only a few days per year – low server utilization is the status quo in the industry. Experts estimate that server utilization in most data centers is only 12-18%. These “comatose” servers still draw almost the same amount of power when idle as they do when processing at full capacity. Additionally, every watt of electricity wasted at the device level has a cascade effect, as still more energy is needed to power the physical infrastructure that supports the device.

Increasing the density of the IT load per rack through consolidation and virtualization can offer substantial savings, not only in equipment but also electricity and space. So why are data center operators leaving these potentially game-changing savings on the table?

Risk. Without real-time monitoring and management, raising inlet air temperature increases the risk of equipment failure. Without a detailed understanding of the relationship between compute demand and power dynamics in the data center, power capping increases the risk that the processing capacity won’t be available when required.

Business-Critical Monitoring Infrastructure

In an intelligent data center thousands of sensors throughout the facility collect information on temperature, humidity, air pressure, power use, fan speeds, CPU utilization, and much more – all in real time. This information is aggregated, normalized and reported in ways that allow the operator to understand and adjust controls in response to current conditions.

Monitoring also offers benefits beyond disaster avoidance. Cloud, co-location and hosting providers can use the data collected to document their compliance with service level agreements (SLAs). Monitoring data can be integrated into the facility’s building management system (BMS), allowing operators to further automate and optimize control of the physical environment.

Visibility at a macro and micro level improves client confidence, streamlines decision making and increases data center availability, productivity and energy efficiency.

These are all powerful outcomes, however they are only achievable when monitoring data is united with other aspects of data center management. In our next blog we look at one specific application - capacity planning - and the role predictive analysis plays in correlating capacity-related data.