Mid-Senior level
Posted March 20, 2026
Job link
Thinking about this job
Responsibilities
Responsibilities
- Owning and operating Amazon MSK clusters in production, including performance tuning and incident response
- Designing and maintaining highly available Kafka architectures across multiple AZs and regions
- Managing Kafka topic design, partitioning strategies, retention policies, and capacity planning
- Deploying and supporting Kafka Connect, Schema Registry, and real‑time streaming integrations
- Monitoring Kafka platform health, throughput, and consumer lag using observability tools
- Partnering with platform, data, and DevOps teams to support real‑time data pipelines
- Implementing and maintaining Kafka security, access controls, and encryption standards
- Supporting automation and infrastructure deployments using Terraform and CI/CD pipelines
- Contributing to platform reliability, documentation, and operational best practices The Ideal Candidate
- Hands‑on experience operating Apache Kafka in production environments
- Experience with Amazon MSK or Kafka running on AWS
- Strong understanding of Kafka internals, including brokers, consumers, lag, and rebalancing
- Advanced Terraform experience, including modules, state management, and remote backends
- Solid AWS fundamentals including VPCs, IAM, networking, and security concepts
- Experience with monitoring tools such as CloudWatch, Prometheus, Dynatrace, or similar
- Comfortable owning production platforms in enterprise or regulated environments
- Collaborative, detail‑oriented, and proactive in improving platform stability and scalability
Not Met Priorities
What still needs stronger evidence
Requirements
- Owning and operating Amazon MSK clusters in production, including performance tuning and incident response
- Designing and maintaining highly available Kafka architectures across multiple AZs and regions
- Managing Kafka topic design, partitioning strategies, retention policies, and capacity planning
- Deploying and supporting Kafka Connect, Schema Registry, and real‑time streaming integrations
- Monitoring Kafka platform health, throughput, and consumer lag using observability tools
- Partnering with platform, data, and DevOps teams to support real‑time data pipelines
- Implementing and maintaining Kafka security, access controls, and encryption standards
- Supporting automation and infrastructure deployments using Terraform and CI/CD pipelines
- Contributing to platform reliability, documentation, and operational best practices The Ideal Candidate
- Hands‑on experience operating Apache Kafka in production environments
- Experience with Amazon MSK or Kafka running on AWS
- Strong understanding of Kafka internals, including brokers, consumers, lag, and rebalancing
- Advanced Terraform experience, including modules, state management, and remote backends
- Solid AWS fundamentals including VPCs, IAM, networking, and security concepts
- Experience with monitoring tools such as CloudWatch, Prometheus, Dynatrace, or similar
- Comfortable owning production platforms in enterprise or regulated environments
- Collaborative, detail‑oriented, and proactive in improving platform stability and scalability
Preferred Skills
- Supporting automation and infrastructure deployments using Terraform and CI/CD pipelines
- Contributing to platform reliability, documentation, and operational best practices The Ideal Candidate
- Hands‑on experience operating Apache Kafka in production environments
- Experience with Amazon MSK or Kafka running on AWS
- Experience with monitoring tools such as CloudWatch, Prometheus, Dynatrace, or similar
- Comfortable owning production platforms in enterprise or regulated environments
- Collaborative, detail‑oriented, and proactive in improving platform stability and scalability
Agility Partners is seeking a qualified Kafka Platform Engineer to support one of our financial services clients within a large‑scale, enterprise event‑driven platform. This role focuses on owning and operating Kafka infrastructure running on Amazon MSK, supporting real‑time data streaming across a regulated enterprise environment. The ideal candidate enjoys deep technical ownership, operating production platforms, and working closely with platform, data, and DevOps teams to ensure reliability, scalability, and strong governance of a mission‑critical Kafka ecosystem. What You’ll Be Responsible For
Owning and operating Amazon MSK clusters in production, including performance tuning and incident response
Designing and maintaining highly available Kafka architectures across multiple AZs and regions
Managing Kafka topic design, partitioning strategies, retention policies, and capacity planning
Deploying and supporting Kafka Connect, Schema Registry, and real‑time streaming integrations
Monitoring Kafka platform health, throughput, and consumer lag using observability tools
Partnering with platform, data, and DevOps teams to support real‑time data pipelines
Implementing and maintaining Kafka security, access controls, and encryption standards
Supporting automation and infrastructure deployments using Terraform and CI/CD pipelines
Contributing to platform reliability, documentation, and operational best practices The Ideal Candidate
Hands‑on experience operating Apache Kafka in production environments
Experience with Amazon MSK or Kafka running on AWS
Strong understanding of Kafka internals, including brokers, consumers, lag, and rebalancing
Advanced Terraform experience, including modules, state management, and remote backends
Solid AWS fundamentals including VPCs, IAM, networking, and security concepts
Experience with monitoring tools such as CloudWatch, Prometheus, Dynatrace, or similar
Comfortable owning production platforms in enterprise or regulated environments
Collaborative, detail‑oriented, and proactive in improving platform stability and scalability
Owning and operating Amazon MSK clusters in production, including performance tuning and incident response
Designing and maintaining highly available Kafka architectures across multiple AZs and regions
Managing Kafka topic design, partitioning strategies, retention policies, and capacity planning
Deploying and supporting Kafka Connect, Schema Registry, and real‑time streaming integrations
Monitoring Kafka platform health, throughput, and consumer lag using observability tools
Partnering with platform, data, and DevOps teams to support real‑time data pipelines
Implementing and maintaining Kafka security, access controls, and encryption standards
Supporting automation and infrastructure deployments using Terraform and CI/CD pipelines
Contributing to platform reliability, documentation, and operational best practices The Ideal Candidate
Hands‑on experience operating Apache Kafka in production environments
Experience with Amazon MSK or Kafka running on AWS
Strong understanding of Kafka internals, including brokers, consumers, lag, and rebalancing
Advanced Terraform experience, including modules, state management, and remote backends
Solid AWS fundamentals including VPCs, IAM, networking, and security concepts
Experience with monitoring tools such as CloudWatch, Prometheus, Dynatrace, or similar
Comfortable owning production platforms in enterprise or regulated environments
Collaborative, detail‑oriented, and proactive in improving platform stability and scalability