Amazon SageMaker, knows its onions
Amazon Web Services, Inc. (AWS) used AWS re:Invent 2024 to announce its next generation of Amazon SageMaker.
Amazon SageMaker AI is a fully managed machine learning (ML) service for data scientists and developers to build, train and deploy ML models into a production-ready hosted environment. It provides a UI experience for running ML workflows that makes SageMaker AI ML tools available across multiple integrated development environments (IDEs).
The updates aim to unify the capabilities that users need for SQL analytics, petabyte-scale big data processing, data exploration and integration, model development and training and generative artificial intelligence (AI) into one platform.
The new SageMaker Unified Studio is designed to help find and access data from across an organisation and bring together purpose-built AWS analytics, ML and AI capabilities for all types of common data use cases, assisted by Amazon Q Developer along the way.
SageMaker Catalog and built-in governance capabilities are said to allow the right users to access the right data, models and development artefacts for the right purpose.
SageMaker Lakehouse
The new SageMaker Lakehouse unifies data across data lakes, data warehouses, operational databases and enterprise applications. AWS says that this makes it easy to access and work with data from within SageMaker Unified Studio and using familiar AI and ML tools or query engines compatible with Apache Iceberg.
New zero-ETL integrations leading Software-as-a-Service (SaaS) applications enable users to access data from third-party SaaS applications in SageMaker Lakehouse and Amazon Redshift for analytics or ML without complex data pipelines.
“We are seeing a convergence of analytics and AI, with customers using data in increasingly interconnected ways – from historical analytics to ML model training and generative AI applications,” said Swami Sivasubramanian, vice president of data and AI at AWS. “To support these workloads, many customers already use combinations of our purpose-built analytics and ML tools, such as Amazon SageMaker – the de facto standard for working with data and building ML models – Amazon EMR, Amazon Redshift, Amazon S3 data lakes and AWS Glue. The next generation of SageMaker brings together these capabilities – along with some new features – to give customers all the tools they need for data processing, SQL analytics, ML model development and training and generative AI, directly within SageMaker.”
The next generation of SageMaker includes a unified studio that gives customers a single data and AI development environment where users can find and access all of the data in their organisation, act on it using the best tool for the job across all types of common data use cases and collaborate within teams and across roles to scale their data and AI initiatives.
SageMaker Unified Studio brings together functionality and tools from the range of standalone “studios,” query editors and visual tools that users enjoy today in Amazon Bedrock, Amazon EMR, Amazon Redshift, AWS Glue and the existing SageMaker Studio.
This makes it easy for developers to access and use these capabilities to discover and prepare data, author queries or code, process data and build ML models.
Amazon Q Developer assists along the way to support development tasks such as data discovery, coding, SQL generation and data integration.
For example, a user could ask Amazon Q, “What data should I use to get a better idea of product sales?” or “Generate a SQL to calculate total revenue by product category.” Users can securely publish and share data, models, applications and other artifacts with members of their team or organization, accelerating the discoverability and usage of the data assets.
Amazon Bedrock
With the Amazon Bedrock integrated development environment (IDE) in SageMaker Unified Studio, users can build and deploy generative AI applications using Amazon Bedrock’s selection of foundation models and tools such as Agents, Guardrails, Knowledge Bases and Flows.
SageMaker Unified Studio comes with data discovery, sharing and governance capabilities built in, so analysts, data scientists and engineers can easily search and find the right data they need for their use case, while applying desired security controls and permissions, maintaining access control and securing their data.