Skill set:
Pyspark / Scala Spark, Databricks, Python, SQL. Must have cloud knowledge of anyone (Azure, AWS, GCP)
Roles and Responsibilities:
Should have programming skills with the ability to write optimized and reusable high-quality code.
Design, develop and maintain scalable data pipelines using Pyspark / Scala Spark, Databricks, Python, SQL.
Optimize SQL queries for efficient data extraction and manipulation.
Demonstrate expertise in Databricks including workflow management and data exploration.
Collaborate with cross functional teams to understand the business requirements.
Implement best practices for data pipelines optimization, ETL, and query performance.
Create and maintain documentations for data pipelines and workflows.
Troubleshoot and resolve issues in data pipelines, ensuring minimal downtime.