Data Management

#eWEEKchat Feb. 9: Down the Batch: Trends in Data Orchestration


On Tuesday, Feb. 9, at 11 a.m. PST/2 p.m. EST/7 p.m. GMT, @eWEEKNews will host its 95th monthly #eWEEKchat. The topic will be “Down the Batch: Trends in Data Orchestration.” It will be moderated by Chris Preimesberger, eWEEK’s Editor in Chief.

Some quick facts:

Topic: “Down the Batch: Trends in Data Orchestration”

Date/time: Tuesday, Feb. 9, 11 a.m. PST / 2 p.m. EST / 7 p.m. GMT

Tweetchat handle: You can use #eWEEKChat to follow/participate via Twitter itself, but it’s easier and more efficient to use the real-time chat room link at CrowdChat. Instructions are on that page; log in at the top right, use your Twitter handle to register, and the chat begins promptly at 11am PT. The page will come alive at that time with the real-time discussion. You can join in or simply watch the discussion as it is created. Special thanks to John Furrier of SiliconAngle.com for developing the CrowdChat app.

Our in-chat experts will include: Eric Kavanaugh, CEO of The Bloor Group and host of DM (Data Management) Radio; others to come.

Chat room real-time link: Use https://www.crowdchat.net/eweekchat. Sign in and use #eweekchat for the identifier.

How to manage all that data efficiently

By Eric Kavanagh (excerpted from a previous eWEEK article)

Data and its mechanism of transport has long been the tried-and-relatively-true practice of extract, transform, load, a.k.a. ETL. That’s now finally changing.

Granted, there have been other ways of moving data: Change data capture (CDC), one of the leanest methods, has been around for decades and remains a very viable option; the old File Transfer Protocol (FTP) can’t be overlooked; nor can the seriously old-fashioned forklifting of DVDs.

Data virtualization 1.0 brought a novel approach as well. This approach leveraged a fairly sophisticated system of strategic caching. High-value queries would be preprocessed, and certain VIP users would benefit from a combination of pre-aggregation and stored result sets.

During the rise of the open-source Hadoop movement about a decade ago, some other curious innovations took place, notably the Apache Sqoop project. Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. Sqoop proved very effective at pulling data from relational sources and dropping it into HDFS. That paradigm has somewhat faded, however.

But a whole new class of technologies–scalable, dynamic, increasingly driven by artificial intelligence–now threatens the status quo. So significant is this change that we can reasonably anoint a new term in the lexicon of information management: data orchestration.

There are several reasons why this term makes sense. First and foremost, as an orchestra comprises many different instruments–all woven together harmoniously. Today’s data world suddenly boasts many new sources, each with its own frequency, rhythm and nature.

Secondly, the concept of orchestration implies much more than integration, because the former connotes significantly more complexity and richness. That maps nicely to the data industry these days: The shape, size, speed and use of data all vary tremendously.

Thirdly, the category of data orchestration speaks volumes about the growing importance of information strategy, arguably among the most critical success factors for business today. It’s no longer enough to merely integrate it, transport it or change it; data must be leveraged strategically.

Down the batch!

For the mainstay of data movement over the past 30 years, ETL took the lead. Initially, custom code was the way to go, but as Rick Sherman of Athena IT Solutions once noted: “Hand coding works well at first, but once the workloads grow in size, that’s when the problems begin.”

As the information age matured, a handful of vendors addressed this market in a meaningful way, including Informatica in 1993, Ab Initio (a company that openly eschews industry analysts) in 1995, then Informix spin-off Ascential (later bought by IBM) in 2000.Those were the heydays of data warehousing, the primary driver for ETL.

Companies realized they could not effectively query their enterprise resource planning (ERP) systems to gauge business trajectory, so the data warehouse was created to enable enterprise-wide analysis.

The more people got access to the warehouse, the more they wanted. This resulted in batch windows stacking up to the ceiling. Batch windows are the time slots within which data engineers (formerly called ETL developers) had to squeeze in specific data ingestions.

Within a short span of years, data warehousing became so popular that a host of boutique ETL vendors cropped up. Then, around the early- to mid-2000s, the data warehouse appliance wave hit the market, with Teradata, Netezza, DATAllegro, Dataupia and others climbing on board.

This was a boon to the ETL business but also to the Data Virtualization 1.0 movement, primarily occupied by Composite Software (bought by Cisco, then spun out, then picked up by TIBCO) and Denodo Technologies. Both remain going concerns in the data world.

Big data boom

Then came big data. Vastly larger, much more unwieldy and in many cases faster than traditional data, this new resource upset the apple cart in disruptive ways. As mega-vendors such as Facebook, LinkedIn and others rolled their own software, the tech world changed dramatically. The proliferation of database technologies, fueled by open-source initiatives, widened the landscape and diversified the topography of data. These included Facebook’s Cassandra, 10gen’s MongoDB and MariaDB (spun out by MySQL founder Monty Widenius the day Oracle bought Sun Microsystems in late 2009)–all of which are now pervasive solutions.

Let’s not forget about the MarTech 7,000. In 2011, it was the MarTech 150. By 2015, it was the MarTech 2,000. It’s now 7,000 companies offering some sort of sales or marketing automation software. All those tools have their own data models and their own APIs. Egad!

Add to the mix the whole world of streaming data. By open-sourcing Kafka to the Apache Foundation, LinkedIn let loose the gushing waters of data streams. These high-speed freeways of data largely circumvent traditional data management tooling, which can’t stand the pressure.

Doing the math, we see a vastly different scenario for today’s data, as compared to only a few years ago. Companies have gone from relying on five to 10 source systems for an enterprise data warehouse to now embracing dozens or more systems across various analytical platforms.

Meanwhile, the appetite for insights is greater than ever, as is the desire to dynamically link analytical systems with operational ones. The end result is a tremendous amount of energy focused on the need for … (wait for it!) … meaningful data orchestration.

For performance, governance, quality and a vast array of business needs, data orchestration is taking shape right now out of sheer necessity. The old highways for data have become too clogged and cannot support the necessary traffic. A whole new system is required.

Questions we’ll discuss

That’s what we’re here to chat about on Feb. 9. Questions we’ll ask include:

Join us Tuesday, Feb. 9 at 11am Pacific / 2pm Eastern for this, the 95th monthly #eWEEKchat. Go here for CrowdChat information.

#eWEEKchat Tentative Schedule for 2021*

Jan. 12: What’s Up in Next-Gen Data Security
Feb. 9: Down the Batch: Trends in Data Orchestration
March 9: New Trends & Services in Health-Care IT
April 13: Trends in Project Management & Collaboration Tools
May 11: Trends in Data Management
June 8: Trends in Data Storage, Protection and Privacy
July 13: Next-Gen Networking Products & Services
Aug. 10: DevSecOps: Open Source Security and Risk Assessment
Sept. 14: Confidential Computing and Next-Gen Security
Oct. 12: DataOps: The Data Management Platform of the Future?
Nov. 9: New Tech to Expect for 2022
Dec. 14: Predixions and Wild Guesses for IT in 2022

*all topics subject to change



Source link

Most Popular

To Top