Top ops for DataOps in Hitachi Vantara Pentaho 8.3

Pentaho is still Pentaho, but these days it’s a product line and accompanying division inside of Hitachi… and not just plain old Hitachi Ltd, but Hitachi Vantara — a company branding exercise that came about in order to unify Hitachi Data Systems alongside Pentaho and a few other morsels under one label.

Company nomenclature notwithstanding, Pentaho has now reached its 8.3 version iteration.

The technology itself focuses on data integration and analytics… but it also thought of as a platform for Business Intelligence (BI) with corresponding competencies in data mining and ETL.

In keeping with current trends, this new version is designed to support DataOps, a collaborative data management practice that helps users realise the full potential of their data.

In deeper detail (as linked above) DataOps describes the creation & curation of a central data hub, repository and management zone designed to collect, collate and then onwardly distribute data such that data analytics can be more widely democratised across an entire organisation and, subsequently, more sophisticated layers of analytics can be brought to bear such as built-for-purpose analytics engines.

“DataOps is about having the right data, in the right place, at the right time and the new features in Pentaho 8.3 ensure just that,” said John Magee, vice president, portfolio marketing, Hitachi Vantara. “Not only do we want to ensure that data is stored at the lowest cost at the right service level, but that data is searchable, accessible and properly governed so actionable insights can be generated and the full economic value of the data is captured.”

Will it (data) blend?

New features in Pentaho include improved drag-and-drop data pipeline capabilities to access and blend data that is difficult to access. A new connector to SAP offers drag-and drop blending, enriching and offloading data from SAP ERP and Business Warehouse.

Pentaho has also addressed the challenge of ingesting streaming data. With a new Amazon Kinesis integration, Pentaho allows AWS developers to ingest and process streaming data in a visual environment as opposed to writing code and blend it with other data, reducing the manual effort.

Amazon Kinesis is an Amazon Web Service (AWS) for processing large streams of big data in real time. Developers can then create data-processing applications, known as Kinesis Data Streams applications. A typical Kinesis Data Streams application reads data from a data stream as data records.

There is also improved integration with Hitachi Content Platform (HCP): the company’s distributed object storage system designed to support large repositories of content, from simple text files to images and video to multi-gigabyte database images.

According to Stewart Bond, research director for data integration and integrity software and Chandana Gopal, research director for business analytics solutions from IDC, “A vast majority of data that is generated today is lost. In fact, only about 2.5% of all data is actually analysed. The biggest challenge to unlocking the potential that is hidden within data is that it is complicated, siloed and distributed. To be effective, decision makers need to have access to the right data at the right time and with context.”

Other details in this news include Snowflake (the data type, not the generation kind) connectivity.

The Pentaho team remind us that Snowflake has quickly become one of the leading destinations for cloud data warehousing. But for many analytics projects, users also want to include data from other sources, including other cloud sources.

Try attempt to provide an answer to this situation, Pentaho 8.3 allows blending, enrichment and analysis of Snowflake data along with other data sources. It also enables users to access data from existing Pentaho-supported cloud platforms, including AWS and Google Cloud, in addition to Snowflake.

You can read more on the Pentaho team’s position on DataOps here.