Python Data Engineer

Overview

On Site

$45 - $50

Full Time

Skills

API

Pandas

NumPy

FOCUS

Scalability

Articulate

Collaboration

Django

RESTful

Workflow

Systems design

Load balancing

Parallel computing

Multithreading

Python

Communication

Technical communication

Computer science

PySpark

Data processing

Docker

MapReduce

Orchestration

Management

Kubernetes

Problem solving

Distributed computing

Job Details

Job Description

Job Title: Python Data Engineer
Location: Houston, TX (Onsite role)
Duration: Long term contract

Job Description:
We are looking for a talented Data Engineer with expertise in Python data processing. The ideal candidate will have a strong background in Python API development, parallel data processing, and distributed systems design. You will be responsible for building and maintaining systems that handle large-scale data processing tasks, ensuring high performance and scalability.

Key Responsibilities:
Python API Development:
o Develop and maintain RESTful APIs using Python web frameworks such as FastAPI or Django.
o Collaborate with front-end developers to integrate user-facing elements with server-side logic.

Parallel Data Processing:
o Utilize Pandas, NumPy, and other libraries to process large datasets efficiently.
o Implement multithreading, multiprocessing, and asynchronous programming techniques.
o Optimize data processing pipelines to handle millions of rows with minimal latency.

Distributed Systems Design:
o Design and implement distributed systems with a focus on scalability and reliability.
o Understand and apply core concepts such as load balancing and task queues.
o Use Docker to containerize applications and manage dependencies.
o (Preferred) Experience with Kubernetes for container orchestration.

Technical Communication:
o Clearly articulate complex technical concepts to team members and stakeholders.
o Document system designs, processes, and code effectively.
o Collaborate with cross-functional teams to align on project goals and deliverables.

Must-Have Qualifications:
Experience in Python Web Frameworks:
o Proficiency with FastAPI, Django, or similar frameworks.
O C# coding
o Understanding of RESTful API principles and best practices.

Docker Knowledge:
o Ability to create and manage Docker Files.
o Experience with containerization for deployment and development workflows.

Systems Design Understanding:
o Basic knowledge of load balancing, task queues, and distributed system concepts.
o Ability to design systems that are scalable and maintainable.

Concurrent and Parallel Computing Skills:
o Proficiency in multithreading and multiprocessing without relying solely on external libraries or frameworks.
o Familiarity with asynchronous programming, particularly asyncIO in Python.

Communication Skills:
o Excellent technical communication abilities.
o Experience collaborating in team environments and conveying complex ideas clearly.

Preferred Qualifications:
Education:
o BS or MS in Computer Science

Advanced Data Processing Tools:
o Experience with Polars, PySpark, or similar tools.
o Handling of large-scale data processing tasks efficiently.

Distributed Computing Experience:
o Hands-on experience with distributed architectures in Docker.
o Familiarity with concepts like task queuing, MapReduce, and saga patterns.

Kubernetes Experience:
o Knowledge of container orchestration using Kubernetes.
o Experience deploying and managing applications in a Kubernetes cluster.

Problem-Solving at Scale:
o Demonstrated ability to solve complex problems using parallel or distributed computing.
o Innovative thinking beyond single-threaded processes.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Job Description

Share