Last week I had the good fortune to attend a keynote presentation by Nate Silver. Silver’s presentation, entitled “The Signal and the Noise: Why So Many Predictions Fail, and Some Don’t” (not coincidentally this is also the title of his new book), addressed the availability of big data and how it affects decision-making.  Being a big time data wonk myself, I was pretty excited.

Yes, yes: I probably need to get out more.

Anyhow, one of Silver’s main points was that when we talk about “big data” we need to think of it as a mass of independent data points, and that without accurate correlation between multiple data points it can be worthless, or even destructive.  In his opinion it’s better to think the value of all this information by focusing not on big data, but on rich data. What’s the difference?  While big data refers to the massive, undifferentiated deluge of data points, rich data instead refers to data that has been processed and refined, eliminating the extraneous and leaving only the meaningful information that can then be used for predicting, planning, and decision making.

describe the imageThis nicely parallels the need for accurate, reliable, historic data to ensure efficient asset management and environmental monitoring in the data center.  In many cases the professionals that are responsible for monitoring and managing the data center are doing it with almost no data at all, let alone without rich data.  Often, the location of servers is only known based on a single data point gathered during the occasional inventory audit process, or the level of cooling required for the entire data center is set using a few sparsely distributed thermostats.  It’s very unusual for a data center manager to have the truly reliable rich data at hand that they need to ensure operational efficiency.

So where would big data center data be refined into rich data center data? Clearly, this would occur in a sophisticated back end system (asset management database, a DCIM platform, building/facilities management system, etc.). Without reliable systems in place that enable users to distill the vast quantities of undifferentiated big data flowing in from their various tag and sensors into rich data, the data center professional is left with the Herculean task of sifting through mountains of data points manually, searching for the signal in the noise.  Given the complexity and dynamism of the data center environment, undertaking this effort without reliable automated assistance seems unlikely to yield much in the way of results.

But no matter how good the system, it all starts with data. As Silver explained, to obtain rich data you must have quantity, quality and variety.  In other words: 

  • Quantity: you have to have enough data at hand to be able to discern patterns and trends

  • Quality: the data must be as accurate as possible

  • Variety: the data should be gathered from multiple sources in order to eliminate bias

So what would a data center need to generate truly rich data?  First and foremost, it's obvious that this data must be generated automatically: the effort that would be required to manually collect enough data about asset locations and environmental conditions and ensuring that it is both accurate and reasonably up to date would be overwhelming for all but the smallest of data centers.  So manually driven data collection processes (clip boards, bar codes, passive RFID tags, and so on) just won't cut it.

Clearly, what's needed is a combination of hardware that will automatically generate and deliver the needed data (active RFID anyone?) and software that helps the user to correlate the vast amount of received “big data” into rich data that can be used to perceive patterns and trends … and ultimately to make informed decisions.

  • For environmental monitoring applications such as managing power and cooling costs, identifying hot spots, ensuring appropriate air flow, generating thermal maps and so forth this requires an array of sensors that generate a continuous flow of accurate environmental readings -- temperature, humidity, air flow/pressure, power usage, leak detection, and so on. 

  • For asset management applications such as physical capacity planning, inventory management, asset security, lifecycle management and so on, this requires tags that automatically report the physical location of individual assets as well as any change in these locations without manual interaction.

The data generated by these devices would then be collected and refined in the backend system, resulting in meaningful, actionable information.  But the key point is that ultimately it’s the data that is the key component here. Without a continuous flow of accurate, reliable data about your data center assets and the environment that surrounds them – data that meets all of Silver’s requirements of quantity, quality, and variety – even the best DCIM platform will be of only limited value.