With our client we are building the next generation data platform that will leverage AI/ML techniques to help redefine how our customers bring life-saving and life-changing products to market. To enable this, we are looking for a Senior Data Engineer to support our Data Science and Machine Learning teams.
Drive the evolution and maintenance of existing data pipelines, ensuring scalability, performance, and maintainability.
Simplify and optimize data models to improve pipeline efficiency and reduce complexity.
Be part of consolidated efforts to reduce redundant or unnecessary data pipelines, promoting reusability and standardization.
Help to finalise the transition from batch to stream-based data processing using Apache Flink, ensuring minimal disruption and optimal performance.
Collaborate with cross-functional teams to design and deliver high-quality, domain-agnostic datasets for AI/ML applications.
Ensure data reliability, consistency, and integrity to support high-impact Data and ML product development.
Actively collaborate within the data engineering team and across adjacent teams (e.g., data science, platform engineering, product) to align on goals, share knowledge, and drive innovation.
7+ years in a Data Engineering role
Expertise working with distributed data technologies (e.g. Hadoop, MapReduce, Spark, Flink, Kafka, etc.) for building efficient & large-scale data pipelines.
Software Engineering proficiency in at least one high-level programming language (Java, Scala, Python or equivalent).
Experience building stream-processing applications using Apache Flink, Spark-Streaming, Apache Storm, Kafka Streams or others.
Knowledge of multi-dimensional modeling like start schema, snowflakes, normalized and de-normalized models.
Knowledge of flexible, scalable data models addressing a wide variety of consumption patterns including random-access, sequential access including necessary optimizations like bucketing,aggregating, sharding.
Expertise in one or more NoSQL database (Neo4J, Mongo DB, Cassandra, HBase, DynamoDB, Big Table etc.).