Confluent: Current 2022 - live report day #2: no going back to 'batch'
Hosted by Confluent, Current 2022 is billed as the ‘next generation of Kafka Summit’ and the Computer Weekly Developer Network team is in Austin, Texas for the showdown.
As an aide memoir, Confluent is a full-scale data streaming platform that enables users to access, store and manage data as continuous, real-time streams – built by the original creators of Apache Kafka, Confluent expands the benefits of Kafka with enterprise-grade features, while removing the burden of Kafka management or monitoring.
As TechTarget reminds us that Apache Kafka is a distributed publish-subscribe messaging system that receives data from disparate source systems and makes the data available to target systems in real-time.
“Kafka is written in Scala and Java and is often associated with real-time event stream processing for big data. Like other message broker systems, Kafka facilitates the asynchronous data exchange between processes, applications and servers. Unlike other messaging systems, however, Kafka has very low overhead because it does not track consumer behaviour and delete messages that have been read,” notes the above TechTarget link.
Current 2022 – live report day #2
Confluent director of product Addison Huddy and Confluent strategic product team technology Kevin Chao created the day #2 keynote by presenting to analysts directly in the press and analyst lounge on how Confluent extends Kafka.
Chao talked about how much the company has invested in Kafka (and therefore Confluent) by speaking to real users (he calls them customers, obviously) to listen to them and ask them where they are on their Kafka journey. As we now enter an age when Kafka has become the de facto standard for data streaming, users are looking at where they will use data streaming at the application deployment point in their IT stack.
There’s very much a sense that users ‘could’ get to where they want with Kafka (in terms of managing it, securing it, working to deliver governance etc.) on their own with open source Kafka, but Confluent exists to deliver many (if not all) of those functions.
Looking at real world use cases, Chao said that the burden is a result of so many tasks such as cluster scaling. With data siloed across legacy and cloud-native systems, accessing, integrating and processing high-value legacy data can be extremely expensive. If developers are looking to create real-time customer experiences, there is a complex path to creating systems that will deliver to the pace of modern systems.
If all this sounds like open source Kafka is some kind of ‘gateway drug’ to get users hooked on data streaming with Confluent ready to sell enterprises services when needed, analysts and journalists at a private briefing session did table this suggestion – Confluent in response has explained that it has invested heavily in open source Kafka itself and wants to preserve the sanctity of the tools at ground level.
So much more than Kafka
But (to further validate on the above point) Confluent says that its platform today is ‘so much more than Apache Kafka’ said Chao and Huddy.
The company has worked at a really deep level with the major cloud hyperscalers and worked through the kernel bugs it has found (let’s not call out which cloud services provider they found it in) and be able to create a fully managed elastic automated product that has zero overhead. Why zero overhead? Because users won’t have to think about aspects like upgrades and all the mechanics needed to achieve elastic scaling, infinite storage, high availability, network flexibility and even DevOps automation.
Successful data streaming (according to Huddy) comes down to having a great transport layer (and that’s Kafka), great connectors (and that’s very much Confluent) and great ability to process, which again is Confluent – so is that’s a perfect storm for real-time data streaming then so be it.
Confluent today provides a consistent data streaming experience across all three major cloud services providers – and that’s pretty critical if we remind ourselves what the reality of modern multi-cloud deployments actually look like. Crucially at this point, users have the ability to connect those multi-cloud instances together using Confluent i.e. there’s no need to provision connectors across two (or potentially more) cloud instances.
Keynote mainstage
Into the main stage section of the keynote, the discussion today is: is streaming really the [total] future, or, in the real world of data, will there always be a place for batch?
The company’s senior engineering VP Chad Verbowski led this session by detailing some of the product announcements delivered this week. Confluent has now announced Stream Designer, a visual interface that enables developers to build and deploy streaming data pipelines in minutes.
Verbowski gave way to Confluent’s VP of product and solutions Greg DeMichillie for a full product demo. He explained that this is a point-and-click visual builder that bids to ‘democratising data streams’ so they are accessible to developers beyond specialized Apache Kafka experts.
“We are in the middle of a major technological shift, where data streaming is making real-time the new normal, enabling new business models, better customer experiences and more efficient operations,” said Confluent co-founder and CEO Jay Kreps. “With Stream Designer, we want to democratise this movement towards data streaming and make real-time the default for all data flow in an organisation.”
Stream Designer provides developers with a way to build pipelines in and describe data flows and business logic easily within the Confluent Cloud UI. It takes a developer-centric approach, where users with different skills and needs can switch between the UI, a code editor and a command line interface to declaratively build data flow logic at top speed. It brings developer-oriented practices to pipelines, making it easier for developers new to Kafka to scale data streaming projects faster.
After building a data pipeline, the next challenge for data developers, data scientists and software developers working in data-centric code environments is maintaining and updating it over its lifecycle as business requirements change and tech stacks evolve.
In response then, Confluent says that Stream Designer provides a unified, end-to-end view to observe, edit and manage pipelines and keep them up to date. Pipelines built on Stream Designer can be exported as SQL source code for sharing with other teams, deploying to another environment, or fitting into existing CI/CD workflows.
The technology allows multiple users to edit and work on the same pipeline live, enabling seamless collaboration and knowledge transfer.
No going back to batch
The resounding message here is, batch (data processing and management) has had its day – well, much of batch will clearly still exist.. but the move to data streaming is certainly real.