sakkmesterke - stock.adobe.com

AWS doubles down on data management

Cloud giant Amazon Web Services declares a bold vision to eliminate the need to extract, transform and load data alongside other efforts to address business problems in domains like cyber security and logistics

Amazon Web Services (AWS) is doubling down on data management and has declared its bold vision to eliminate the need to extract, transform and load (ETL) data from source to data storage systems for query and analysis.

Speaking at AWS re:Invent 2022 in Las Vegas, AWS CEO Adam Selipsky noted that ETL processes have been manual and complex, with a lot of heavy lifting involved in building ETL pipelines.

“It requires writing a bunch of custom code, then you have to deploy and manage the infrastructure to make sure the pipeline scales,” Selipsky said. “Still, it can be days before the data is ready.

“And all the while you’ve got eager analysts pinging you again and again to check if their data is available. And when something changes, you get to do it all over again.”

Selipsky said AWS has been building integrations between its services to make it easier to do analytics and machine learning without having to deal with ETL.

For instance, he said, AWS has federated query capabilities in the AWS Redshift data warehouse and the Athena query service, enabling users to run queries across a wide range of databases and data stores, even those from third-party applications and other cloud suppliers, without moving any data.

“To make it easy for customers to enrich all their data, AWS Data Exchange seamlessly integrates with Redshift and enables you to access third-party datasets and your own data in Redshift – no ETL required,” said Selipsky.

“We’ve also integrated SageMaker with Redshift and Aurora to enable anyone with SQL skills to operate machine learning models, make predictions and also without having to move data around. These integrations eliminate the need to move data around for some important use cases,” he added.

In AWS’s vision of a “no-ETL future”, Selipsky said data integration will no longer be a manual effort, starting with a preview of zero ETL integration between the Aurora relational database engine and Redshift.

“This integration brings together transactional data with analytics capabilities, eliminating all of the work of building and managing custom data pipelines between Aurora and Redshift,” he added.

On the analytics front, Selipsky said customers will also be able to run Spark queries on data in AWS Redshift, without the need to build or manage any connectors.

“Both zero ETL integration between Aurora and Redshift and the Redshift integration for Apache Spark make it easier to generate insights without having to build ETL pipelines or manually move data around,” he added.

During his keynote address, Selipsky also announced AWS’s domain offerings in cyber security and supply chain management. This includes Amazon Security Lake, a data lake service that centralises security data from cloud and other cyber security companies, as well as Amazon Supply Chain, a logistics application built on Amazon’s logistics expertise.

Sid Nag, vice-president analyst at Gartner’s technology and service provider group, noted that AWS’s heightened focus on data management and domain solutions underscores its efforts to go beyond core infrastructure services.

He said AWS has traditionally focused on launching point services, but the data integration and domain offerings signify a “new AWS” that is looking more at ways to help organisations address their business problems.

Read more on Master data management (MDM) and integration