Middle Shift Operations Engineer (Application support)

Apply Refer a friend
Our Vacancie

Location:

🇺🇦Ukraine

Partner:

Pandora

Technologies:

Security ServiceNow Site Reliability Support System Administration

Seniority:

Middle
  • Overview

    Our client is Pandora company, it is a Danish jewelry brand, and one of the most famous jewelry brands in the world. We are building a team to help improve the systems’ reliability and develop great partnerships for years.

    Pandora is looking to build an Integrated Operation Center to improve its customer focus and business outcomes. The IOC will initially provide incident management and production support. This effort is part of Pandora’s DevOps journey in which monitoring of its platform and support for engineering is key. Pandora’s goals under its DevOps journey include improving productivity, creating efficiencies/savings and generally strengthening its culture. The IOC will bring a culture of accountability, reliable follow up and strong SLA compliance. And it will be core to moving towards that ‘single pane of glass’ for the performance of services across Pandora.

    Similarly, the IOC will improve and implement Application Performance Monitoring (APM) at Pandora leveraging NewRelic. The goal is to achieve greater operational performance, help the engineering team to achieve greater focus and drive a better customer experience. Together the teams will improve key business metrics and accomplish Pandora’s ultimate goal of a leading customer experience and stronger business outcomes.

  • Responsibilities

    Works in shifts to cover support of customer's applications. The team will be based in Mexico (Night shift) and Turkey/Ukraine (Morning/Afternoon shifts) and work in follow-the-sun setup.

    For this position in Turkey/Ukraine shifts are rotating each week (8:00-16:00 GMT+3, 15:00-23:00 GMT+3) among  8 team members and every team member is expected to work at least 2 weekend days per month.

    • Monitors Alert management tools and acknowledges P1/P2 alerts
    • Performs initial alert triaging, runs runbook steps, goes through checklists, runs initial diagnostic scripts to detect false positive and prevent unnecessary escalations
    • Performs initial escalation to SRE/Senior IOC engineer and when true positive Incident is confirmed - declares incident, updates status page, starts collection of post mortems/RCA log, escalates to development teams, gather war room and implements workaround/performs rollback when necessary
    • Updates outdated/missing Operations documentation, SOPs, runbooks,  checklists
    • Updates Alerts, Thresholds, tags, aggregation/deduplication rules to decrease noise of Alert Management systems
    • Tracks planned maintenance that coincides with his/her shift
    • Logs Incidents/Alerts he worked on during his/her shift
    • Re-assigns/closes Incident tickets/mutes alerts during shift handover
    • Submits individual shift report, and Shift Handover report at the end of the shift, syncs with engineer on the next shift
    • Constantly monitors and improves team’s KPIs - like decrease MTTD/MTTA and increase number of Critical Alerts/Incidents responded per person per shift, decrease # of escalation, increase # of Incidents resolved on first hit
    • Self-learning & extreme ownership attitude
    • Takes part in regular trainings, on-boarding and KT sessions, mentors Junior Shift Engineers
  • We require

    Must have

    • Min 2 years of experience in Web-based Application Support (in contrast to customer support, which is not in scope for this position). 
    • Good understanding of ISO/OSI model, encapsulation, Network protocols used by web, dns, tcp/udp, HTTP/S, REST API applications
    • Basic scripting with bash/shell script/powershell and Ansible or similar tool
    • SQL fundamentals
    • HTTP/S methods and Error codes
    • Chrome Developer tools
    • Fluent with Linux/Windows CLI tools used for network apps/db troubleshooting: ping, traceroute, curl, nc, ss, dig, nslookup, tail, grep, nmap, tcpdump, telnet, sql client, postman etc
    • Experience in using any monitoring/observability solution
    • Experience in using on-call rotation and Alerting management platforms (like OpsGenie/ PageDuty)
    • Understands Incident/Problem resolution flow, escalation criteria and key NOC KPIs
    • able to document and updates team’s KB articles, write runbooks

    Nice to have:

    • Incident management skills and troubleshooting production web-/enterprise application issues
    • Experience with SaaS monitoring/observability solutions (like NewRelic, DataDog, Splunk, AppDynamics, Instana)
    • Experience in writing team documentation and efficient runbooks/SOPs for production web-applications/API- and cloud services
    • Understands different Web- and enterprise- Application architectures (FE, BE, DB, distributed and monolith), typical REST API call flow, CDN caching, Event Driven Architecture
    • Basic scripting with bash/PowerShell, routine task automation (ansible or similar tool)

Join our team!

Send us your CV and we will contact you as soon as possible.

X

okYour message is sent. Thank you for contacting us, we will get in touch with you soon.

*mandatory fields
Upload CV