1. Design, develop, and maintain scalable data processing pipelines using Kafka, PySpark, Python/Scala, and Spark.
2. Work extensively with the Kafka and Hadoop ecosystem, including HDFS, Hive, and other related technologies.
3. Write efficient SQL queries for data extraction, transformation, and analysis.
4. Implement and manage Kafka streams for real-time data processing.
5. Utilize scheduling tools to automate data workflows and processes.
6. Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
7. Ensure data quality and integrity by implementing robust data validation processes.
8. Optimize existing data processes for performance and scalability.
Krtrimaiq is an agile start-up with the attitude and power of an enterprise, focused on applying data science and AI to build an intelligent enterprise. Our technology expertise spans industries and technology domains, including robotic/cognitive process automation, big data, statistical analysis and optimization, machine learning and deep learning.