Design, deploy, and manage Kubernetes-based big data platforms, including Kafka, Spark, Airflow, NiFi, Trino, Hive, Superset, and Kubeflow.
Build, optimize, and maintain CI/CD pipelines and GitOps workflows using Jenkins, GitLab CI, ArgoCD, or Flux.
Automate infrastructure setup, configuration, and scaling with Helm, KubeVela, Terraform, and Ansible.
Manage cloud-native storage and networking solutions, including MinIO, DirectPV, Longhorn, HDFS, Ceph, Calico, MetalLB, HAProxy, and Keepalived.
Ensure platform reliability, high availability, disaster recovery, and security across all big data services.
Implement monitoring, logging, alerting, and performance optimization using Prometheus, Grafana, ELK stack, and other observability tools.
Collaborate with data engineers and scientists to optimize workflows and platform performance.
Perform cluster upgrades, patching, troubleshooting, and optimization for big data applications.
Enforce security best practices including TLS/SSL, IAM, RBAC, LDAP/SSO, Keycloak, OPA, and Ranger.
Maintain strong proficiency in Linux, shell scripting, and container runtimes like Docker and containerd.
Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
Minimum 3 years of experience in DevOps or Site Reliability Engineering (SRE).
Hands-on experience with Kubernetes, Helm, Ingress, operators, and big data technologies such as Kafka, Spark, Hive, Airflow, NiFi, Trino, and MinIO.
Strong knowledge of CI/CD pipelines and GitOps workflows using Jenkins, GitLab CI, ArgoCD, or Flux.
Experience with infrastructure as code using Terraform, Ansible, and Helm.
Familiarity with cloud-native storage and networking solutions (Ceph, Longhorn, DirectPV, Calico, MetalLB, HAProxy, Keepalived).
Proficient in Linux, shell scripting, and container runtimes (Docker, containerd).
Strong skills in monitoring, troubleshooting, and performance optimization.
Experience with KubeVela or other application delivery platforms, machine learning/data science platforms (Kubeflow, MLflow), security and IAM integrations (LDAP, Keycloak, OPA, Ranger), and cloud environments (AWS, GCP, Azure, or private cloud) is a plus.
نحن نستخدم ملفات تعريف الارتباط لضمان حسن سير عمل موقعنا. للحصول على تجربة زيارة محسنة ، نستخدم منتجات التحليل. يتم استخدامها عندما توافق على "الإحصائيات".بيان الخصوصية