Overview
On Site
$45 - $50
Full Time
Skills
API
Pandas
NumPy
FOCUS
Scalability
Articulate
Collaboration
Django
C#
RESTful
Workflow
Systems design
Load balancing
Parallel computing
Multithreading
Python
Communication
Technical communication
Computer science
PySpark
Data processing
Docker
MapReduce
Orchestration
Management
Kubernetes
Problem solving
Distributed computing
Job Details
Job Description
Job Description
Job Title: Python Data Engineer
Location: Houston, TX (Onsite role)
Duration: Long term contract
Job Description:
We are looking for a talented Data Engineer with expertise in Python data processing. The ideal candidate will have a strong background in Python API development, parallel data processing, and distributed systems design. You will be responsible for building and maintaining systems that handle large-scale data processing tasks, ensuring high performance and scalability.
Key Responsibilities:
Python API Development:
o Develop and maintain RESTful APIs using Python web frameworks such as FastAPI or Django.
o Collaborate with front-end developers to integrate user-facing elements with server-side logic.
Parallel Data Processing:
o Utilize Pandas, NumPy, and other libraries to process large datasets efficiently.
o Implement multithreading, multiprocessing, and asynchronous programming techniques.
o Optimize data processing pipelines to handle millions of rows with minimal latency.
Distributed Systems Design:
o Design and implement distributed systems with a focus on scalability and reliability.
o Understand and apply core concepts such as load balancing and task queues.
o Use Docker to containerize applications and manage dependencies.
o (Preferred) Experience with Kubernetes for container orchestration.
Technical Communication:
o Clearly articulate complex technical concepts to team members and stakeholders.
o Document system designs, processes, and code effectively.
o Collaborate with cross-functional teams to align on project goals and deliverables.
Must-Have Qualifications:
Experience in Python Web Frameworks:
o Proficiency with FastAPI, Django, or similar frameworks.
O C# coding
o Understanding of RESTful API principles and best practices.
Docker Knowledge:
o Ability to create and manage Docker Files.
o Experience with containerization for deployment and development workflows.
Systems Design Understanding:
o Basic knowledge of load balancing, task queues, and distributed system concepts.
o Ability to design systems that are scalable and maintainable.
Concurrent and Parallel Computing Skills:
o Proficiency in multithreading and multiprocessing without relying solely on external libraries or frameworks.
o Familiarity with asynchronous programming, particularly asyncIO in Python.
Communication Skills:
o Excellent technical communication abilities.
o Experience collaborating in team environments and conveying complex ideas clearly.
Preferred Qualifications:
Education:
o BS or MS in Computer Science
Advanced Data Processing Tools:
o Experience with Polars, PySpark, or similar tools.
o Handling of large-scale data processing tasks efficiently.
Distributed Computing Experience:
o Hands-on experience with distributed architectures in Docker.
o Familiarity with concepts like task queuing, MapReduce, and saga patterns.
Kubernetes Experience:
o Knowledge of container orchestration using Kubernetes.
o Experience deploying and managing applications in a Kubernetes cluster.
Problem-Solving at Scale:
o Demonstrated ability to solve complex problems using parallel or distributed computing.
o Innovative thinking beyond single-threaded processes.
Location: Houston, TX (Onsite role)
Duration: Long term contract
Job Description:
We are looking for a talented Data Engineer with expertise in Python data processing. The ideal candidate will have a strong background in Python API development, parallel data processing, and distributed systems design. You will be responsible for building and maintaining systems that handle large-scale data processing tasks, ensuring high performance and scalability.
Key Responsibilities:
Python API Development:
o Develop and maintain RESTful APIs using Python web frameworks such as FastAPI or Django.
o Collaborate with front-end developers to integrate user-facing elements with server-side logic.
Parallel Data Processing:
o Utilize Pandas, NumPy, and other libraries to process large datasets efficiently.
o Implement multithreading, multiprocessing, and asynchronous programming techniques.
o Optimize data processing pipelines to handle millions of rows with minimal latency.
Distributed Systems Design:
o Design and implement distributed systems with a focus on scalability and reliability.
o Understand and apply core concepts such as load balancing and task queues.
o Use Docker to containerize applications and manage dependencies.
o (Preferred) Experience with Kubernetes for container orchestration.
Technical Communication:
o Clearly articulate complex technical concepts to team members and stakeholders.
o Document system designs, processes, and code effectively.
o Collaborate with cross-functional teams to align on project goals and deliverables.
Must-Have Qualifications:
Experience in Python Web Frameworks:
o Proficiency with FastAPI, Django, or similar frameworks.
O C# coding
o Understanding of RESTful API principles and best practices.
Docker Knowledge:
o Ability to create and manage Docker Files.
o Experience with containerization for deployment and development workflows.
Systems Design Understanding:
o Basic knowledge of load balancing, task queues, and distributed system concepts.
o Ability to design systems that are scalable and maintainable.
Concurrent and Parallel Computing Skills:
o Proficiency in multithreading and multiprocessing without relying solely on external libraries or frameworks.
o Familiarity with asynchronous programming, particularly asyncIO in Python.
Communication Skills:
o Excellent technical communication abilities.
o Experience collaborating in team environments and conveying complex ideas clearly.
Preferred Qualifications:
Education:
o BS or MS in Computer Science
Advanced Data Processing Tools:
o Experience with Polars, PySpark, or similar tools.
o Handling of large-scale data processing tasks efficiently.
Distributed Computing Experience:
o Hands-on experience with distributed architectures in Docker.
o Familiarity with concepts like task queuing, MapReduce, and saga patterns.
Kubernetes Experience:
o Knowledge of container orchestration using Kubernetes.
o Experience deploying and managing applications in a Kubernetes cluster.
Problem-Solving at Scale:
o Demonstrated ability to solve complex problems using parallel or distributed computing.
o Innovative thinking beyond single-threaded processes.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.