Overview
Skills
Job Details
Lead engineer who can assess the current landscape, do data profiling, data mapping, build solutions etc.
Tech stack: Google Cloud Platform. Scala, Spark, Kafka, Database (many) with data profiling /data mining skills, end-to-end ownership
Responsibilities
Design, develop, and maintain robust and scalable ETL workflows and data pipelines using tools like Hive, Spark, and Airflow.
Implement and manage data storage and processing solutions using Apache Hudi and BigQuery.
Develop and optimize data pipelines for structured and unstructured data in Google Cloud Platform environments, leveraging GCS for data storage.
Write clean, maintainable, and efficient code in Scala and Python to process and transform data.
Ensure data quality, integrity, and consistency by implementing appropriate data validation and monitoring techniques.
Work with cross-functional teams to understand business requirements and deliver data solutions that drive insights and decision-making.
Troubleshoot and resolve performance and scalability issues in data processing and pipelines.
Stay updated with the latest developments in big data technologies and tools and incorporate them into the workflow as appropriate.