The very existence of this supplement is proof that the storage industry has embraced the concept of Information Lifecycle Management (ILM). Fundamental to any ILM strategy is the idea that categories and copies of data should be stored on different types of storage media in such a way as to match the accessibility needs for any sort of data with the cost of the media that holds it.
This ILM storage model calls for the deployment of a range of different media and storage types that deliver capacity, resiliency, portability, and performance at different price points, with the goal of migrating data onto the most suitable storage medium at each point during its life cycle. Unfortunately, ILM today is easier to conceptualise than to implement, with most organisations unable to ensure that data is copied and positioned in the right place and right format in consideration of where it is in its life cycle and its relative needs in terms of availability and integrity.
With massive volumes of data, automation is the only key. Critically, the technologies that help to create copies of data on different media are not generally available with analytical and rules-based engines to selectively identify the ideal location and format for data and then automatically copy or move it there. Almost as an aside, it’s widely accepted that up to 80 per cent of data held online will never, in fact, be needed. Why isn’t this data simply removed? Again, it’s mostly because there are no adequate tools to automatically analyse the data and accurately predict which 20 per cent will be required.
Presently, HSM utilities are often presented as the ILM storage tool but, while extremely effective at transparently re-locating data to less expensive media for some environments, they tend to incur additional costs by making the primary storage they work on extremely expensive to manage. For instance, millions of HSM stub files cause file system degradation and make routine disk maintenance inefficient. The knock on effect is backup, defragmentation, and virus scans may take up to 10 times longer to complete. Neither are most HSM products integrated with other services such as backup or archiving, nor do they tend to involve distributed architectures that scale to service enterprise environments.
Another barrier to ILM is inaccurate perceptions about the cost of media. Many organisations have come to view inexpensive forms of disk as less expensive than alternatives such as tape and optical; the falling cost of network connectivity has also led to the view that moving copies of data via the network to remote disk storage obviates the need for removability, another attribute that made tape and optical so attractive. Although the prevailing view is that the real cost of any storage technology is determined not only by the simple cost per gigabyte but by the total cost of the infrastructure and management process around it, ultimately it’s disk’s relative ease-of-use when working with unmanaged media that has led IT departments to believe disk is the only solution. But disk is only a cheap solution in the absence of technology to make managing alternative media, locations, and formats efficient and inexpensive.
The reality is that a range of media is needed to marry the goals of ILM with the need to perform fundamental storage management activities that combine disk with removable media for backup, replication, and archiving purposes.
Intelligent Storage Management (ISM): inclusion vs integration
The ILM model is, therefore, most likely to come to fruition on a platform capable of using automated policies to manage data with all the fundamental services of replication, backup, archiving, and HSM. To be truly effective, these services themselves should be integrated and the platform should be capable of allowing users to select and integrate any combination of media types into a sharable, consolidated, virtualised resource. This resource houses primary copies of data based on their life cycle stage and ensures managed copies of data to meet all availability and integrity requirements.
Such a platform – an Intelligent Storage
Management platform – is the key to allowing IT organisations to create a comprehensive and cost-effective storage infrastructure to meet availability and integrity service levels and comply with external standards. The Intelligent Storage Management policy-based, automated approach integrates all the key strategic storage management services – backup, replication, archiving, and HSM – into a coherent system at minimum expense:
• Replication is used to prepare alternative copies of raw data on media capable of functioning as primary storage, to be brought online immediately in a disaster
• Backup creates manageable copies of multiple generations of data on lower cost media, with the ability to perform partial or full restores rapidly for resuming operations in various failure scenarios. Its ability to work across a network and/or support removable media ensures recovery can be performed remotely from a secure location in case of disaster
• Archiving is used to ensure that multiple additional copies of data are available to applications and users for both internal availability needs and compliance purposes. Archive copies should be created on multiple media types to meet the various requirements for availability should they come to replace the original online version when it no longer warrants primary disk space: the same file can be stored on both inexpensive disk for rapid retrieval to the online environment and on removable media for offsite storage, or on WORM devices for compliance
• HSM is used when data must appear to be online when, in fact, it is stored elsewhere to reduce the physical need for expensive online media.
The key aspect of ISM is not just that it includes these services, but that it integrates them. It consolidates the management, allocation and automation of all various secondary storage devices and media so that each of the storage management services can safely and efficiently share from the same storage investments, whether large libraries or simply recycled disk and tape volumes. It also consolidates the execution of services through a common set of policies and rules-based automation engines. The ISM platform therefore contains different levels of storage management infrastructure that wrap around the basic replication, backup, archiving and HSM services to make them available to any application environment throughout the enterprise. These include:
• Application and file system-based policy engines, for selecting data to be handled through base storage management services
• Agents for the various services, which gather and copy data from systems and prepare them to be sent to secondary storage. These agents allow data to be tracked for managed recovery based on the needs of each service
• Media/device management layer, which uses policies to match data copies from each service to the optimal media device, type, and volumes, to mediate data transfer and to automate all aspects of long-term media handling, including retention, rotation, maintenance, and retirement
• Recovery and migration tools for assisting with special access to offline data.
The ISM platform is most efficient at solving the storage management challenge if it’s capable of serving either the entire enterprise, or a substantial portion of it. To do this requires true enterprise architecture, which is more than simply supporting multiple, common computing platforms. It entails distributed processing so that multiple, distributed servers can be deployed in parallel to maximise performance and scalability. It also involves distribution of functionality to perform different functions based on the topology and operations of each IT organisation; this could entail allowing centralised secondary storage to service particular storage management operations, while local secondary storage is used to provide other services at the remote locations.
In any event, distributed processing is mandatory to scale up to handle massive amounts of data, maximising performance and providing resiliency if not true fault tolerance. Multiple systems should also be capable of serving secondary storage capacity of various types for the same reasons. Finally, the management view of this distributed architecture must be consolidated so that, if desired, a central operations staff can perform administration of the ISM platform for all the various sites or environments supported.
The ISM concept is exactly the approach that we have taken here at BridgeHead Software. Aside from integrating the basic enterprise storage management services into a single platform, BridgeHead’s ISM technology offers a number of features that differentiates it from the piecemeal use of backup and HSM utilities that is most commonly seen in the industry:
• Multiple copies/multiple uses. BridgeHead’s ISM technology archives and migrates data in multiple copies onto multiple media, each with their own independent retention and media management policies. An important aspect of the multiple copy approach is that it stands in contrast to the more prevalent approach of ‘migration’. Migration calls for files to be moved from a more expensive medium to a less expensive one in reaction to events such as space utilisation watermarks being crossed. The result of this technique is that data is often moved when the system is under stress from disk space shortages. Migration also does not provide the necessary resilience or compliance as the data does not go immediately to its final destination.
• Tight application coupling. BridgeHead’s application archiving module provides a variety of interfaces including APIs and CLIs to allow applications to easily integrate with the platform. While transparent HSM is available, it’s not the only option. Application archiving means that the application retains control of the data, moving it to and from the archive in accordance with its own internal policies.
• Integrated backup/archiving. Archiving and backup are performed for entirely different reasons and are each optimised to handle particular types of data. Throughout the industry, efficient storage management is not achieved by those who use backup tools for archiving or archiving tools to perform a primary disaster recovery role. The primary concern of backup is to provide extremely rapid recovery of large chunks of data representing entire systems in the event of recovery from system failure. While backup includes file journaling, it’s not an efficient tool for keeping long-term track of individual files and efficiently locating them and bringing them back from removable media. The archiver, on the other hand, is ideal for extremely long term management of unchanging data.
• Active file archiving. For unstructured data, BridgeHead provides the ability to automatically identify and archive, migrate or delete, based on sophisticated policies using all available file attributes. There are many SRM tools available to analyse file systems and identify problem files that are causing poor disk utilisation, but there are very few, if any, that can convert the results of analysis into an automated solution to the problem.
ILM is an admirable goal, and it’s hardly breaking new ground to suggest that no-one is quite there yet. But it’s important to realise now, rather than later, that without integration between – not just inclusion of – its various services, ILM will never provide all of the benefits that its proponents intended. Such an outcome will leave us still looking for answers and, quite possibly, our data.