Data virtualization helps agencies realize Federal Data Strategy goals
The Federal Data Strategy 2020 Action Plan outlines a broad agenda for federal agencies leveraging data as a strategic asset. It offers a 10-year vision for how the federal government will accelerate the use of data to deliver on agency missions, serve the public and steward resources while protecting security, privacy and confidentiality. The FDS’ goal is to fully leverage the value of federal data for mission, service and the public good by guiding agencies in practicing ethical governance, conscious design and a learning culture. The strategy outlines 10 principles, 40 practices and 20 specific, measurable actions, ranging from ensuring relevance of the data, using data to guide decision-making and publishing and updating data inventories.
In general, all of this guidance is focused on making data more actionable and sharable while assuring that it can only be accessed by authorized users. Some actions, like launching a chief data officer council, are straightforward, whereas others, like improving data and model resources for artificial intelligence research and development, might seem a bit more ambitious.
Fundamentally, the FDS calls on agencies to modernize their data infrastructures. Unfortunately, many will find that difficult to achieve as they struggle with the sheer growth of data and its dispersal across a staggering range of locations. In the private sector, this challenge has spawned a number of advanced data management and data integration techniques that have proved to be a differentiator for leading-edge companies. However, in the public sector, many agencies rely on more traditional techniques for data integration, such as extract, transform and load (ETL) processes.
ETL tools are highly effective for moving large amounts of data from one repository to another, but because they run in scheduled batches, they cannot deliver data in real time. Also, they are extremely resource-dependent in that their scripts must be revised, re-tested and redeployed every time there is a change to a source system. They are also unable to accommodate many modern data sources, such as streaming and internet-of-things data, which are critical for AI development. Nor can they support unstructured sources such as social media feeds.
The power of logical data fabric
In recent years, a new strategy for data integration has surfaced — one that does not rely on the physical replication of data. Rather than physically collecting data into a new repository, analysts can logically and remotely connect to the data, leaving the source data in its existing location. This “logical data fabric” enables flexibly spreads across an organization’s data sources, providing seamless access to data consumers.
One of the most effective ways to implement logical data fabric is through a modern data integration and data management technology called data virtualization. DV is a natural fit for logical data fabric, because rather than physically replicating data into a new repository, DV provides real-time, virtualized views of the data, leaving the source data exactly where it is. This means that agencies can gain all of the benefits of data integration without paying the costs of moving and housing the data or unnecessarily complicating their compliance efforts.
Because DV accommodates existing infrastructure, it is relatively easy to implement, compared with other solutions. By providing data in real time from a variety of systems that are normally very difficult to integrate, such as transactional processing systems and cloud-based storage systems, DV can also support a wide variety of uses cases, including many of the recommendations in the FDS.
One Defense Department agency, for example, wanted to reduce its Oracle footprint and migrate the data to a Hadoop-based data lake to support its data warehouse modernization efforts. At the same time, officials wanted to combine their two data centers to support advanced analytics to help their personnel in the field. These types of projects tend to be time consuming and complex, due to the amount of data replication involved. They also tend to incur large storage costs. With DV, the agency was able to provide simultaneous access to both data centers, to the data warehouse and to the data lake during the transition. As a result, there was virtually no disruption to the data consumers while these changes were taking place.
The change also provided economic relief, as the agency was able to reduce its data integration expenses by 80%. Supported by the new data infrastructure, the agency can now respond up to 97% faster than it could using the previous infrastructure and can easily and securely embrace new cloud-based platforms and services. By implementing DV and, in turn reducing software/hardware storage costs and shortening the timeline for data warehouse modernization, the agency is saving millions per year.
Increasingly capable cloud technologies continue to challenge government agencies to glean immediate, actionable intelligence from their data, share it in a secure manner and leverage it to enable information flexibility and agility. In time, all federal agencies will find ways to realize the FDS goals, but leveraging a logical data fabric is a proven and successful place to start.