sakkmesterke - stock.adobe.com

Databricks taps groundswell in lakehouse adoption

Databricks has been expanding its footprint across the Asia-Pacific region amid growing interest among traditional enterprises and digital native companies in the data lakehouse architecture

Databricks has been boosting its presence in the Asia-Pacific region amid the groundswell of interest in the data lakehouse, an emerging architectural approach that combines the best of data warehouses and data lakes.

The company has established teams in Singapore to serve the Southeast Asian markets where it counts the largest banks and telcos as clients. It also works with technology unicorns such as Grab, among other digital native companies.

Databricks recently expanded into New Zealand, as well as South Korea, a market which has seen strong interest in its offerings from Korean conglomerates and gaming companies, according to Ed Lenta, Databrick’s senior vice-president and general manager in Asia-Pacific and Japan.

“We’re seeing incredible response from customers around this concept of the lakehouse,” Lenta said. “They understand that the vast majority of the world’s data sits in data lakes today, and if they can bring the best capabilities of what was associated in the past with data warehouses to that architecture, they can do some pretty amazing things.”

The term data lakehouse started appearing in the technology lexicon in recent years in an attempt to address the limitations of data lakes, which, while more cost-efficient than data warehouses, lack enterprise capabilities such as transaction support, data engineering and data governance.

According to Matt Aslett, research director for data, artificial intelligence and analytics at 451 Research, the data lakehouse blurs the lines between data lakes and data warehousing by maintaining the cost and flexibility advantages of persisting data in cloud storage.

At the same time, it also enables “schemas to be enforced for curated subsets of data in specific conceptual zones of the data lake, or an associated analytic database, in order to accelerate analysis and business decision-making,” he noted earlier this year.

Aslett said one of the key enablers of the lakehouse concept is a structured transactional layer, noting that Databricks added this capability to its unified analytics platform, which provides Spark-based data processing for data in Amazon Web Services (AWS) or Microsoft Azure cloud storage, in April 2019 with the launch of Delta Lake.

“Now an open-source project of the Linux Foundation, Delta Lake provides a structured transactional layer with support for Acid [atomic, consistent, isolated, durable] transactions, updates and deletes, and schema enforcement,” he added.

Lenta said there are a few entry points for Databricks customers. They include those who realised that data science is hard to get right, prompting them to look for a platform that helps data science teams do exploratory data science and create machine learning models.

“They want those models in production to drive business outcomes and they don’t want to do this once; they want tens or even hundreds of models inside their organisations,” Lenta said.

Another group of Databricks users are those that face data processing challenges such as ingesting data from different sources, as well as dealing with streaming data and batch processing.

“They need to do very complex ETL [extract, transform, load] jobs, and so they look to Databricks to simplify that so that they can get more people more access to data in a way that’s valuable to them,” Lenta added.

Besides Databricks, AWS has also introduced a lakehouse approach to move data between data lakes and purpose-built data stores, giving users a single place to run analytics across most of their data, along with the use of purpose-built analytics services to address specific use cases like real-time dashboards and log analytics.

Aslett said while there are plenty of data warehousing aficionados who hate the lakehouse concept, it appears to be here to stay.

“The ecosystem of vendors offering functionality that fits the general description is growing steadily. Not all of them are using the term and there are those that define – and spell – it differently, so there is scope for confusion.

“Overall, however, the trend toward using cloud object storage services as a data lake to store large volumes of structured, semi-structured and unstructured data is not going to diminish any time soon, and there are clear performance and efficiency advantages in bringing structured data processing concepts and functionality to that data, rather than having to export it into external data-warehousing environments for analysis,” said Aslett.

Read more about data management and analytics in APAC

  • Healthcare providers are harnessing data analytics to improve clinical and operational outcomes even as they continue to face challenges in data aggregation and data protection.
  • Informatica has consolidated its operations in four key Asia-Pacific markets in a move that will enable it to better meet the demand for cloud-based data management software.
  • Teradata has been positioning itself to capture the region’s cloud opportunities through not only R&D, but also efforts to support enterprises in their cloud journey.
  • Snowflake set foot in APAC just three years ago and has started to gain traction among large enterprises in Singapore, India and Southeast Asia.

Next Steps

Databricks raises $1.6B more to boost data lakehouse

Read more on Data warehousing