Success Story

Helping a Global Fashion Retailer Optimize Cloud Costs and Performance

Redesigning a critical Order-Invoice Kafka Producer service to be many times faster and substantially reduce cloud infrastructure costs.
4X lower
cloud costs with optimized Kafka-based service architecture.
7X faster
event processing (150K events/hour vs 21K before).
Technologies

Technologies

Expertise

Expertise

Get this case study in PDF to your email

    Client Overview

    Leading American Luxury Fashion Retailer

    NDA

    Our client is an American luxury fashion retailer. The company has more than ten thousand employees and offers a wide range of accessories, clothing, and other goods. This retailer manages vast product assortments and complex supply chains, necessitating extremely cost-efficient and reliable back-end systems.

    Industries:

    Retail, FashionTech

    Country:

    USA
    NDA
    Challenges

    Redesigning a Critical Service for Scalability and Reliability

    The company needed to replace a legacy service with complex dependencies and poor performance with a new version without any downtime.
    Have a Similar Problem?
    Reduce dependencies and replace legacy systems with highly scalable microservices.
    Contact Sales
    Ellipse

    Increase durability, scalability, and efficiency

    The company needed to redesign a critical service to increase its durability, scalability, and efficiency.
    Ellipse

    Dependency removal on obsolete systems

    The redesign was required to remove dependencies on the services planned for deprecation.
    Ellipse

    Zero-downtime transition mandate

    The legacy service needed to be replaced with a new one with no downtime.
    Ellipse

    Complex, error-prone legacy design

    Initially, the design of the Order-Invoice Kafka Producer (OINK v1) was complex, error-prone, and difficult to debug.
    Ellipse

    Performance lag and event reliability

    OINK v1 experienced heavy processing lag, handling only ~21,000 RTCIM events per hour on 12 pods, which led to backlogs and missed events that were hard to trace.
    Have a Similar Problem?
    Reduce dependencies and replace legacy systems with highly scalable microservices.
    Contact Sales
    Why They Chose Us

    Proven Expertise in Event-Driven Architecture and Highload Systems

    Selection based on deep capabilities in cloud performance optimization and reliable migration execution.
    Tailored AI strategy for each client

    Architectural mastery

    Zoolatech was chosen for its core expertise in event-driven architecture, microservices, and high-load systems, which were essential for redesigning the complex OINK service.
    Tailored AI strategy for each client

    Execution reliability

    Our expertise in the backend, cloud/AWS, and data engineering demonstrated our ability to plan and execute a reliable, zero-downtime transition.
    Zoolatech is a senior-heavy engineering firm with Silicon Valley roots and a Miami HQ, specializing in legacy modernization, system re-architecture, and AI deployment to drive long-term, compounding value.

    2017

    Year Founded

    600+

    Employees

    96%

    Client Satisfaction
    Workflow

    Thorough Analysis, Decoupling, and Shadowed Transition

    A phased development process focused on achieving version compatibility and verifying performance in a production environment.
    Phase 1

    Analysis and architecture design

    We thoroughly analyzed the service architecture and its limitations. We helped our partner to design the new service architecture, establish the development process, and plan and execute the transition to the new version.
    Phase 2

    Decoupling and compatibility

    Zoolatech helped the client to decouple OINK from the system called OMS and created OINK v2, which was completely compatible with version 1.
    Phase 3

    Testing and shadowed deployment

    We created the new testing service “comparator,” which compares the RTCIMs generated by version 1 and version 2. The second version of OINK was tested on the production environment in a shadowed mode, and several upstream issues were identified and resolved.
    Phase 4

    Cross-team collaboration

    During this task, the Zoolatech team worked closely with OMS, Payment, Tax teams (upstream), ERTM (Enterprise Retail Transaction Management), and Sales Audit departments (downstream).
    We re-architected a critical Kafka service to dramatically boost throughput while cutting cloud costs by four times—delivering a faster, more reliable system with zero downtime.
    Solution

    Designing a Loosely Coupled, High-Performance Kafka Producer

    Implementing a fully compatible, event-driven microservice architecture with advanced testing and zero-downtime traffic redirection.
    approve

    Decoupled architecture and compatibility

    Zoolatech helped the client to decouple OINK from the system called OMS and created OINK v2, which was completely compatible with version 1. The quality criterion was the inability of downstream systems to recognize RTCIM parcels created by different OINK versions.
    approve

    Testing and traffic management

    We created a reliable method of redirecting production traffic from Oink version 1 to version 2. This required creating the new testing service “comparator,” which compares the RTCIMs generated by version 1 and version 2.
    approve

    Enhanced reliability and DLQ

    We developed a DLQ (Dead Letter Queue) layer all over the OINK v2 service components, thanks to Kafka Connect’s DLQ support, ensuring that no single event was missed during processing.
    approve

    Dependency and scaling optimization

    Thanks to loose coupling in the new event-driven architecture, we easily changed the dependency on 4 upstreams to just one incoming stream of events. Also, we achieved faster processing by using the right number of Kafka topic partitions for pod scaling.
    Results

    Massive Cost Reduction and Seven-Fold Performance Boost

    Delivering a high-load, robust service that drastically cuts infrastructure expenses.
    Ellipse

    Cloud cost reduction

    We helped our partner to significantly reduce cloud costs. Three pods of OINK v2 perform approximately the same as $12$ pods of OINK v1, reducing costs by approximately four times.
    Ellipse

    Performance improvement

    Thanks to using the right number of Kafka topic partitions for pod scaling, we achieved seven times faster RTCIM events processing with $12$ pods.
    Ellipse

    Resource optimization

    With the same number of pods, OINK v2 uses over four times less memory. Total RAM consumption dropped from 60 GB to just 14.4 GB across 12 pods.
    Business Value

    Achieving Efficiency and Enabling Strategic Deprecation

    The new efficient service led to improved customer satisfaction and the ability to eliminate obsolete systems.
    approve

    Efficiency and reliability

    The partner achieved a more efficient and reliable critical service, leading to cost savings, increased customer satisfaction, and business performance.
    approve

    Strategic cleanup

    The successful replacement of the legacy service with the new one, with no downtime, and the elimination of dependencies on services planned for deprecation helped our partner launch the deprecation of obsolete systems and achieve further cost savings on the infrastructure.