Robert Kneschke - Fotolia

Tokopedia ups ante on observability capabilities

Indonesian e-commerce giant Tokopedia has consolidated its observability capabilities on a single cloud-based platform, enabling it to improve customer service and identify infrastructure issues

Most call centre teams would have encountered irate customers who may not be able to provide the context of the problems they are facing, whether it is a failed transaction or a billing issue.

At Indonesian e-commerce giant Tokopedia, for example, a failed transaction could be caused by issues with prepaid vouchers that customers may use to pay for their purchases.

And because Tokopedia was using a slew of observability tools, including Grafana, to surface problems with its systems and processes, it could take a while to resolve customer- and infrastructure-related problems.

“While all this tooling was good, whenever there were key production issues, we could not get an entire end-to-end view of what was happening and the impact,” said Ryan de Melo, vice-president of engineering at Tokopedia. “Getting the full picture can be done, but it will take a lot of time.”

Earlier this year, Tokopedia wanted to see how it could more easily gain full visibility across its operations, and consolidated its observability capabilities on New Relic, a supplier of cloud-based software to help website and application owners track the performance of their services.

These capabilities have empowered Tokopedia’s customer support team to identify customer issues while they are on a case, as well as its engineering teams to monitor its massive technology footprint that spans Elasticsearch clusters, multiple database management systems and public cloud services from Amazon Web Services, Google Cloud and Alibaba Cloud.

“New Relic became very handy [based on] the fact that now we do not need to invest more on DevOps,” said De Melo. “And like any fast-growing company, we do not want to waste time on rudimentary things, which we can get from a decent SaaS [software-as-a-service] solution.”

Read more about IT in Indonesia

He was also surprised at the cost effectiveness of the New Relic solution, which has enabled the company to reduce infrastructure overheads. “When you’re doing it at scale, New Relic can definitely achieve the most efficiency as they have massive engineering teams working towards optimising storage, queries and other areas which we would not have invested in.”

At a technical level, De Melo said Tokopedia is tapping New Relic in areas such as application performance monitoring, distributed tracing, as well as monitoring communications between microservices to understand cyclic dependencies.

On the infrastructure side, New Relic is also able to keep an eye on its memory stores and network, among other things. “The beauty is if we know that there’s an issue with a particular memory store, we can measure the impact to the customer very accurately,” he said.

During the implementation process, Tokopedia worked closely with New Relic to figure out the best software practices. “We created our observability wrapper library, and New Relic offers a whole bunch of tooling to ensure that we follow standards across all the different services,” said De Melo.

Investment put to the test

Tokopedia’s investment in New Relic was put to the test early on. As it supports 50 payment methods spanning online and offline modes, it started noticing failures around certain transactions that were difficult to catch.

“But after we deployed New Relic, we could very clearly view the payment initiation and payment completion, and figured out a few issues,” said De Melo.

Today, most of Tokopedia’s critical services, including logistics, fulfilment and payments, are already being tracked by New Relic. It is in the midst of moving its logs to the platform, with the work expected to be completed early next year.

With its observability metrics already codified and being presented as dashboards for internal stakeholders, Tokopedia is looking to expose some of that observability code to partners, said De Melo.

“Our goal is to share what we’ve done and also take learnings from all these other companies,” he said.

“I’m not saying there’s one ideal observability model that should be replicated, but I think this will give us the advantage of having more contributors for what we are doing.”

Read more on IT operations management and IT support