Starburst slithers support for Python DataFrame 

Data lake analytics platform company Starburst has extended support for Python with PyStarburst and announced a new integration with the open source Python library Ibis, built in collaboration with composable data systems builder and Ibis maintainer Voltron Data. 

For Starburst and Trino developers and data engineers, this announcement means that they no longer need to offload data to frameworks like PySpark and Snowpark to handle complex transformation workloads. 

Software application development teams can now use a single MPP (massively parallel processing) engine for both their analytical and transformation workloads.

PyStarburst provides a familiar syntax to PySpark and Snowpark for writing and running production-grade ETL pipelines and data transformations, making it easy to build new pipelines with PyStarburst and to migrate existing PySpark and Snowpark pipelines to Starburst. 

“Many data engineers prefer writing code over SQL for transformations, and many software engineers are used to building data applications in Python. With PyStarburst, we’re giving them the freedom to do so with the increased productivity and performance of Starburst’s enterprise-grade Trino,” said Martin Traverso, CTO of Starburst.

For developers and data engineers looking to build scalable data applications, the new Ibis integration provides a uniform Python API that can execute queries on more than 18 different engines – including DuckDB, pandas, PostgreSQL and now Starburst Galaxy.

Traverso says that this means developers can scale from development on a laptop to production in Galaxy without rewriting a single line of code.

“Python users struggle to bridge the gap between prototypes on their laptops and production apps running on platforms like Starburst Galaxy. Ibis makes it much easier to bridge this gap,” said Josh Patterson, CEO of Voltron Data. “With Ibis, developers can write Python code once and run it anywhere, with any supported backend execution engine. You can move seamlessly from crunching gigabyte-scale test data on your laptop to crunching petabyte-scale data in production using Starburst Galaxy.”

Ibis and Starburst Galaxy enable users to write portable Python code that executes on Starburst’s data lake analytics engine, operating on data from more than 50 supported sources. Users will now be able to build analytic expressions across multiple data sources with reusable scripts that execute at any scale.