Overview
Skills
Job Details
Role: Google Cloud Platform Data Engineer
Duration: 6+ months
Location: Philadelphia, PA (Hybrid)
Experience: 5+ years
Job description:
We are seeking a highly skilled Data Engineer to design, build, and maintain scalable data platforms that enable large-scale ingestion, storage, processing, and analysis of structured and unstructured data. This role will focus on constructing data products (data lake / data warehouse), optimizing data pipelines, and implementing robust ETL workflows to support analytics, machine learning, and operational reporting.
The ideal candidate will be proficient in distributed computing, cloud-based data architectures (Google Cloud Platform), and modern data processing frameworks. Experience with real-time data streaming (Kafka, Apache Beam), MLOps, and infrastructure automation (Terraform, Jenkins) is highly preferred. Google Cloud including GitHub, Docker, and Kubernetes experience is a must.
Key Responsibilities:
Data Platform & Architecture Development
- Design, implement, and maintain scalable data platforms for efficient data storage, processing, and retrieval.
- Build cloud-native and distributed data systems that enable self-service analytics, real-time data processing, and AI-driven decision-making.
- Develop data models, schemas, and transformation pipelines that support evolving business needs while ensuring operational stability.
- Apply best practices in data modeling, indexing, and partitioning to optimize query performance, cost efficiency, considering best practices for Sustainability.
ETL, Data Pipelines & Streaming Processing
- Build and maintain highly efficient ETL pipelines using SQL and Python, to process large-scale datasets.
- Implement real-time data streaming pipelines using Kafka, Apache Beam, or equivalent technologies.
- Develop reusable internal data processing tools to streamline operations and empower teams across the organization.
- Write advanced SQL queries for extracting, transforming, and loading (ETL) data with a focus on execution efficiency.
- Ensure data validation, quality monitoring, and governance using automated processes and dashboards.
MLOps & Cloud-Based Data Infrastructure
- Deploy machine learning pipelines with MLOps best practices to support AI and predictive analytics applications.
- Optimize data pipelines for ML models, ensuring seamless integration between data engineering and machine learning workflows.
- Work with cloud platforms (Google Cloud Platform) to manage data storage, processing, and security.
- Utilize Terraform, Jenkins, CI/CD tools to automate data pipeline deployments and infrastructure management.
Collaboration & Agile Development
- Work in Agile/DevOps teams, collaborating closely with data scientists, software engineers, and business stakeholders.
- Advocate for data-driven decision-making, educating teams on best practices in data architecture and engineering.
Required Skills & Qualifications
- 5+ years of experience as a Data Engineer working with large-scale data processing.
- Strong proficiency in SQL for data transformation, optimization, and analytics.
- Expertise in programming languages (Python, Java, Scala, or Go) with an understanding of functional and object-oriented programming paradigms.
- Experience with distributed computing frameworks.
- Proficiency in cloud-based data engineering on AWS, Google Cloud Platform, or Azure.
- Strong knowledge of data modeling, data governance, and schema design.
- Experience with CI/CD tools (Jenkins, Terraform) for infrastructure automation.
Preferred Qualifications
- Experience with real-time data streaming (Kafka, or equivalent).
- Strong understanding of MLOps and integrating data engineering with ML pipelines.
- Familiarity with knowledge graphs and GraphQL APIs for data relationships.
- Background in retail, customer classification, and personalization systems.
- Knowledge of business intelligence tools and visualization platforms.