Overview
Hybrid
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 12 Month(s)
Skills
RTML Framework
kubernetes
KubeFlow
ML Ops
Job Details
Title: RTML Engineer // ML Ops Engineer
Location: Dallas, TX (Or) NJ
What you will be doing:
You will join our critical Real Time ML Service team working on our RTML Model Serving Framework.
This is a fundamental team in our AI Center, and RTML Framework serves all of our real time AI models in the production - enabling our business organizations to maximize the benefits of using AI-driven solutions for our customers.
As a Principle Engineer, you will be
- Functioning as a domain expert in the area of RTML model serving technology, familiar with the industrial trends in RTML, common RTML architectures, leading 3rd-party RTML serving products, and evaluation criteria s
- Working closely with other teams to define technical strategy, architecture, development choices and ensure overall growth of the Jarvis Framework to meet our internal customers needs.
- Leading the Jarvis development activities through phased releases, ensuring it is architecturally sound, implemented correctly/efficiently, and delivered on time.
- Supporting internal customers with major framework issues and coordinating triage efforts to solve them.
- Lead and mentor junior developers in the team and always pushing for team successes.
- Adhering to industry standards and best practices and tracking emerging RTML technologies and trends to continuously improve the Jarvis framework.
You ll need to have:
- Bachelor s degree or above in Computer Science/Engineering or other related areas.
- Four or more years of work experience in computer software development related jobs.
- At least two years are in AI / ML Engineering areas with reasonably good understanding of Data Science and AIML practices/workflows.
- Strong expertise in RTML model serving arena and/or large scale cloud-based RT framework development.
- Experience with kubernetes. The candidate should be comfortable with kubectl and helm.
- Experience in creating, deploying, and maintaining centralized KubeFlow infrastructure on top of one or multiple kubernetes clusters
- Experience with cloud infrastructures and ML Ops in clouds.
- Familiar with CI/CD process and common frameworks such as ArgoCD.
- Experience with programming languages such as Python and Java.
- Experience in large application development in cloud environments - AWS, Google Cloud Platform and On-Prem clusters.
- Experience in K8s architecture and principle of operations, hands-on skills of deploying large applications in production K8s cluster, configuring K8s properly, and troubleshooting when the application has issues.
- Good understanding of of RT system stats collection and performance monitoring methods
- Basic understanding of RT Feature Engineering methodology and practices
- Understand basic data science concepts and common needs from data scientists.
Raj Vemula
Director Resource Development
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.