Helping a Global Fashion Retailer Optimize Cloud Costs and Performance
USA
2 years
Retail
6
Summary
Business challenge:
Our partner needed to redesign a critical service to make it more scalable, cost-efficient, and reliable.
Zoolatech approach:
We have thoroughly analyzed the service architecture and its limitations, helped our client redesign the service and transition to the new version with no downtime.
Value delivered:
Our partner achieved a more efficient and reliable Order-Invoice Kafka Producer, leading to cost savings, increased customer satisfaction, and business performance.
Cloud costs reduced by 4 times
7 times faster events processing
Memory consumption reduced by 4 times
Technologies:
Facing similar challenges? Contact our experts now.
About our Client

Our client is an American luxury fashion retailer. The company has more than 10,000 employees and offers a wide range of  accessories, clothing, and other goods.

media-retail-cases-0-65575e06
Increasing scalability, durability, and efficiency of a critical service

The company needed to redesign a critical service to increase its durability, scalability, and efficiency, while removing dependencies on the services planned for deprecation. The legacy service needed to be replaced with a new one with no downtime.

A thorough analysis and new architecture design

We have helped our partner to design the new service architecture, establish the development process, plan and execute the transition to the new version with no downtime, ensuring data consistency and performance metrics.

Initially, the design of the Order-Invoice Kafka Producer was complex, error-prone, and difficult to debug. It consumed 10 types of events, had the algorithm of event joining with the 8-day time frame only, partial DLQ coverage, and often missed events that were tough to identify.

OINK v1 design
media-graph01-0-6582e57a

Service performance was limited to 21k of the RTCIM events (main event type) per hour on 12 pods.

In general, OINK v1 suffered from a significant lag (~400). It processed:

  • 21k rtcim/hour (12 pods);
  • 1.75k rtcim/hour (1 pod);
  • 29 rtcim/min (1 pod).
Loosely coupled microservices and transition with no downtime

First, Zoolatech helped the client to decouple OINK from the system called OMS and created OINK v2, which was completely compatible with version 1. The quality criterion was the inability of downstream systems to recognize RTCIM parcels created by different OINK versions. As a part of decoupling, Zoolatech created the new testing service “comparator,” which compares the RTCIMs generated by version 1 and version 2.

Also, we created a reliable method of redirecting production traffic from Oink version 1 to version 2. The second version of OINK was tested on the production environment in a shadowed mode, and several upstream issues were identified and resolved. During this task, the Zoolatech team worked closely with OMS, Payment, Tax teams (upstream), ERTM (Enterprise Retail Transaction Management), and Sales Audit departments (downstream).

A more scalable, efficient, and reliable service
OINK v2 design and implementation
media-graph02-0-6582e69c

Peak throughput:

  • 150k rtcim/hour (12 pods);
  • 12.5k rtcim/hour (1 pod);
  • 208 rtcim/min (1 pod).

Processing: Three pods of OINK v2 equals 12 pods of OINK v1.

Total Memory Usage (RAM): 14.4GB (OINK v2) vs. 60GB (OINK v1) / 12 pods.

  • Thanks to loose coupling in the new event-driven architecture, we easily changed the dependency on 4 upstreams to just one incoming stream of events.
  • Thanks to using the right number of Kafka topic partitions for pod scaling, we achieved seven times faster RTCIM events processing with 12 pods.
  • We developed a DLQ layer all over the OINK v2 service components thanks to Kafka Connect’s DLQ support, ensuring that no single event was missed during processing.

We helped our partner to significantly reduce cloud costs.Three pods of OINK v2 perform approximately the same as 12 pods of OINK v1. So, we  can either process data faster with the same resources or reduce costs by approximately four times. Memory consumption is 4+ times less than in the previous version, considering the same number of pods.

Reduced cloud costs and improved performance

Zoolatech helped the company to design a new service architecture, establish the development process, plan and execute the transition to the new version with no downtime, ensuring data consistency and performance metrics.

  • A high-load, robust, and scalable service working 7 times faster than the previous version.
  • Elimination of dependencies on services planned for deprecation.
  • Successful replacement of the legacy service with the new one with no downtime.
  • Cost reduction. 3 pods of OINK v2 perform approximately the same as 12 pods of OINK v1, reducing costs by approximately 4 times.
  • Optimization of cloud resources and maintenance costs, including incident analysis and further development costs.
  • Reduction of memory consumption.
  • Identifying and resolving several issues in the client’s upstream data.
  • Helping our partner launch the deprecation of obsolete systems and achieving further cost-savings on the infrastructure.

Overall, our partner achieved a more efficient and reliable critical service, leading to cost savings, increased customer satisfaction, and business performance.

Contact us
Let's build great
products together!