1. Ability to develop ETL (Extract, Transform, Load) processes to move and transform data between systems
2. Hands-on experience in coding languages such as Python or Scala
3. Good understanding of data engineering concepts including data warehousing, data lakes, data modeling, data quality assurance, data lifecycle management, streaming data, and metadata management
4. Strong hold on cloud-based ecosystems, preferably AWS
5. Familiarity with data pipeline orchestration tools such as Airflow or Azkaban (Airflow preferred)
6. Experience working with Spark, EMR, and Kubernetes clusters
7. Expertise in Spark jobs, shell scripts, and optimized data pipelines
8. Experience with relational SQL and NoSQL databases including PostgreSQL, Amazon Redshift, and MongoDB
9. Understanding of OOP-based data platforms and ability to develop data platform modules
10. Good understanding of backend APIs and their use cases
Annual CTC: Competitive salary
OR