Our client is a Danish jewelry brand, and one of the most famous jewelry brands in the world. We are building a team to help improve the systems’ reliability and develop great partnerships for years.
In the capacity of a Site Reliability Engineer with a specialization in e-commerce and proficiency in Salesforce Commerce Cloud (SFCC)/Storefront Reference Architecture (SFRA), you will collaborate with product teams to enhance the sophistication of engineering methodologies and protocols within one of Europe's most prominent jewelry e-commerce initiatives.Your role will involve close partnership with Development teams to refine architectural and developmental methodologies, instituting industry-leading practices for testing, and the deployment of e-commerce solutions into production environments. You will provide guidance on optimizing system observability, augmenting performance, and harmonizing integration with Order Management Systems (OMS), Enterprise Resource Planning (ERP), and fulfillment services. Your efforts will be pivotal in minimizing production disruptions and elevating the consumer experience.
Should you possess expertise in e-commerce systems, particularly SFCC/SFRA, you are presented with the opportunity to make a significant impact on one of the world's largest online and omni-channel retail platforms utilizing this cloud-based e-commerce solution.
Implementing observability, setting up metric/log based monitoring and alerting
Implement monitoring and logging solutions to proactively identify and address performance bottlenecks
Defining SLOs and measuring SLIs, Error budgets of production applications/services
Improving operaitonal KPIs like MTTD/MTTR, service availability & reliability
Collaborating with product team to optimise processes throughout product life cycle
Verifying system performance and scalability by participating in performance, load, and stress testing
Evangelizing SRE’s mission within the company including cloud engineering best practices and operational readiness
Work with engineering teams to refine deployment and release processes
Monitor and stress test systems to collect metrics for tuning and capacity planning
Work to automate detection and resolution of recurring issues (problem management)
Ensure safety, predictability, repeatability, and suitability of all build and deploy processes
Eliminate toil by automating repetitive tasks
Implement self-healing automation to handle known errors and prevent incident reoccurrence
Experience with e-commerce projects
Understanding of Infrastructure as Code using tools like Terraform or ARM templates to automate the deployment of AKS clusters
Strong hands-on technical experience in software deployment and operations on public Cloud platforms, CI/CD, deployment automation, and Pipelines
Experience scaling and securing microservices based architectures
Experience with NodeJS (or JavaScript runtime environments), Rest APIs
Understanding of event streaming (Kafka or any other analogs like RabbitMQ, Apache ActiveMQ Artemis, IBM MQ, Apache Pulsar)
Knowledge of best practices in product lifecycle from solution design, development, testing to operating large scale real-time systems
Fluent in scalability and root cause analysis exercises (blameless RCA, Postmortems)
Comfortable scripting and debugging distributed web-based applications
Nice to have:
Experience with Salesforce Commerce Cloud (SFCC)
Expertise in e-commerce systems (Storefront Reference Architecture (SFRA))
Experience implementing feature flagging and integrating with APM tools (New Relic, Datadog, Dynatrace, AppDynamics )
Experience within Microsoft Azure, containerisation and Azure Kubernetes Service (AKS)
A strong understanding of release automation/continuous integration and trunk-based development with experience coaching engineers to adopt automation and full stack feature flagging to achieve measurable efficiency gains.
Experience with CDN: Caching, security features
Experience with e-commerce integrations (PIM, CMS, OMS, etc)
Experience with incident command/management (ServiceNow), ITSM and ITIL processes
Experience in training and coaching engineering teams on cloud services, tooling and best practices
Pro-activeness and persistence in driving team’s tasks to completion with stakeholders inside company as well as with 3rd party vendors