Capitalizing on data-driven expenditures is predicated on—and most times, impeded by—the innate differences in format, structure variation, settings, applications, terminology, and schema of information assets in the contemporary big data ecosystem.
Data modelers, data scientists, or just your average self-service business user is tasked with rectifying these points of differentiation for comprehensively informed analytics and application utility. Little has changed about these data preparation rigors within the past 12 months.
What’s emerged, and will continue to thrive in the New Year, is a viable solution for consistently taming this complexity to expedite time to value for data processes.
By considerably expanding the scope and scale of data modeling, organizations can leverage common models for any use case in which data are culled from differing systems. Already, holistic data fabrics can integrate data anywhere throughout the enterprise to conform to universal data models. Other approaches for expanding data modeling beyond individual use cases and applications include:
- Single Repositories: Although each of these methods have substantial data science implications, implementing a single repository with standardized data models is particularly apposite to this discipline because it collocates “enterprise wide ontologies of all the relevant business objects and concepts, and then maps data from all the various places from where [they] reside into one big enterprise knowledge graph,” commented Franz CEO Jans Aasman. This solution is ideal for feature engineering.
- Industry-Specific Models: These models encompass entire verticals—like the pharmaceutical industry or any other—and are renowned for their swift implementation time and ease of use. According to Lore IO CEO Digvijay Lamba, they “predefine the data in terms of the business objects, business entities, business metrics, and you map the data to those business metrics and everything will just run smooth.”
- Inter-Enterprise Models: Exchanging data between organizations is becoming increasingly adopted for strategic alliances, mergers and acquisitions, and subsidiaries. Common data model approaches based on what One Network COO Joe Bellini characterized as “federated master data management” are acclaimed for facilitating these capabilities in real-time.
These methods broaden data modeling’s worth beyond individual use cases to universally facilitate mainstays like mapping, schema, time-series analysis, and terminology standards across—and between—enterprises.
Common data models are speedily becoming a necessity for organizations to determine all relevant data for any singular use case, the most convincing of which is still cognitive computing deployments.
Time Series Analysis
The temporal advantages of common data models are some of their most invaluable. Not only do they reduce the time spent engineering data for applications or analytics, but they’re also primed for managing time sensitive concerns with low latent capabilities rivaling that of digital twins. Simple event-based schema exemplifies these temporal benefits with a universal applicability throughout the enterprise since “almost anything in databases is transactional or about things that happen at a particular point in time,” Aasman explained. “If something happens at a particular point in time, you can describe it as an event.” Events include start times and stop times, like when callers interacted with contact centers, for example, and are comprised of sub-events to express the depths of events—like callers’ thoughts about products, cancellations for services, and reasons why.
Uniform schema is germane for exhaustively mapping organizations’ data into a single repository or for issuing low latent interactions between organizations for supply chain management, for instance. Federated MDM’s inter-organization, shared data models facilitate what Bellini termed “demand driven responses” to real-time scenarios—including anything from responses to public health crises to business ones. He likened it to Uber in which “the rider can see all the drivers so I can get a ride, and the driver can see all the riders.”
As Aasman and Lamba suggested, wrangling diverse data into uniform models is largely based on mapping, which illustrates the dichotomy between universal models and statistical Artificial Intelligence. On the one hand, those models contain any assortment of differentiated data which is what the most accurate predictive models require. On the other, top solutions for uniform models leverage machine learning so that “you are managing a core data model on which your intuitive AI is mapping all the source data,” Lamba remarked. This capacity partly accounts for the ease of use of industry-specific data models; source data across the organization, its different departments, and different data bases are automatically mapped to the model.
There are two principal benefits of this approach. It’s an excellent means of modeling data for data science endeavors. Secondly, “the source mapping is separate as abstracted out and compared to the business rules that are running on top,” Lamba mentioned. This characteristic is critical to the long term reusability of common models because, as Lamba specified “the business rules don’t change if something changes in the sources.” Sources invariably change over time, which is responsible for much of the reworking of traditional data models.
In addition to supporting flexible, common schema, universal data models must standardize the terminology describing business concepts, especially across different data types. Without using the same terms to describe the same ideas, it’s like “if instead of the United States having a common currency, every state had its own currency,” Bellini posited. “How difficult would it be to execute trade?” Standardized schema relies on standardized terminology. Thus, in the event trees (consisting of events comprised of sub-events) Aasman referenced, “every term in my tree is described somewhere else in a taxonomy or an ontology. Nothing in my tree is just made up by a programmer. Everything is standards based.”
According to Lamba, certain common data model approaches enable users to leverage natural language technologies to “create your own derived language on top” of the model. In standards-based settings, the extensibility of this approach is desirable because “whenever a new team comes into the picture, they look at the data model and want to extend it for their reasons, they should be able to do that,” Lamba stipulated. Consequently, each department can leverage the same model, extend it for its departmental use, and its additions are both understandable and usable to others throughout the organization.
A basic requisite for spanning data models across the enterprise, industry-specific deployments, or between organizations is to center them on entities that are a business’ primary concern, which Lamba characterized as a “customer, a physician, or a provider.” Coupling event schema with individual entities offers the following advantages:
- Simplicity: The plainness of these uniform models is highly sought after. Their objects consist solely of an entity and event “instead of a complex schema,” Aasman noted.
- Feature Generation: Instead of multiple pages for holistic queries across data sources, entity event queries in single repositories involve “just one sentence,” Aasman confirmed—enabling rapid feature identification for machine learning.
- Customer 360s: Each entity has a universal identifier contained in every event so organizations can swiftly trace a customer’s or patient’s journey for comprehensive analysis.
The premier boon of industry-specific models is their cross-departmental or cross-enterprise use. Other gains from this approach include:
- Inclusiveness: The all-inclusiveness of these models is vertical specific. In healthcare, there are substitute parts “for a similar medical purpose or the same medical purpose; all that’s modeled in,” Bellini revealed. “Same thing happens in automotive: spare parts.”
- Pre-Built: Because these models have already been built, organizations “don’t have to put effort into people to manage or run all of this stuff,” Lamba said.
- Subject Expertise: These models signal a praised progression from data to business concerns since “the business team doesn’t have to hire an expert in data; they can just hire domain experts,” Lamba observed.
MDM’s common data models expand their worth via a federated approach between multiple organizations for data supported truths that enhance:
- Specificity: By offering control tower exactness of exchangeable data or resources between organizations for supply chain management, for example, they can detail “what is the individual unit and item, because that gives them more flexibility when things change,” Bellini reflected. “You need to be able to represent it, and that’s what the modeling is.”
- Real-Time Responses: Implicit to the foregoing boon is a low latent response to evolving business conditions, which is useful since “the business world changes over time,” Aasman acknowledged.
- Richer Predictive Analytics: Real-time monitoring of other organization’s data, when combined with a firm’s own data, creates ideal machine learning training data conditions, so that with federated MDM “all the nodes in the network can get all the data to collaborate and solve problems,” Bellini said.
Common data models streamline, simplify, and increase the applicability of data modeling across silos for interminable use cases. They allow organizations to leverage all their data for single deployments—such as for creating and maintaining cognitive computing models. They’re responsible for transforming this facet of data management from a restrictive necessity to an objective enabler of any data-centric undertaking.
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideBIGDATA newsletter.