Overview
On Site
Full Time
Skills
Innovation
Graphics design
Data centers
Embedded systems
Data
Open source
Optimization
FOCUS
Benchmarking
ASIC
Field engineering
Object Data Manager
Oracle Data Mining
Marketing
Business development
Collaboration
Debugging
Firmware
Reliability engineering
NPI
Computer hardware
Exceed
WINS
Continuous improvement
Modeling
Analytics
Performance tuning
Scalability
Network design
TCP
InfiniBand
Storage architecture
Design
Roadmaps
Network
Software development
Software deployment
Leadership
Linux
Python
Ansible
Training
Performance analysis
CPU
GPU
Communication
Management
Patents
Publications
HPC
Computer networking
Storage
Artificial intelligence
Machine Learning (ML)
Cloud computing
Computer science
SAP BI
Sales
Purchasing
Military
Law
Recruiting
Job Details
WHAT YOU DO AT AMD CHANGES EVERYTHING
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE TEAM:
AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.
THE ROLE:
The DC GPU Fellow - AI Enablement - Infrastructure is a leadership position designed to optimize the at-scale debug, deployment, and operational capabilities of our Instinct based CPU and GPU systems within datacenter environments. Leveraging extensive experience in network architecture, Storage, AI/ML network deployments, Open Source and Custom Models, and performance tuning, this role requires a disciplined approach to system triage, at-scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at-scale datacenter deployment.
THE PERSON:
This position is for a DC GPU Fellow - AI Enablement- Infrastructure, with a focus on architecture, design, optimizing the compute, network, and storage and benchmarking the Machine Learning applications. You will be part of a team closely work with strategic customers and partners to enable large scale deployment of AMD CPU and GPU platforms. You will closely interface with ROCm software developers, DC GPU HW/FW/ASIC Teams, Field Engineering Teams, OEM/ODM partners, CSPs, and Marketing/Business Development teams.
You will be part of a world class team of highly qualified computational scientists/engineers enabling applications for AI/ML, HPC across industry, academia, cloud service providers and national laboratories. Must be self-motivated and possess the ability to work well within a team environment.
KEY RESPONSIBILITIES:
PREFERRED EXPERIENCE:
ACADEMIC CREDENTIALS:
Bachelors, Masters or PhD in Computer Science , Engineering or related subjects, or equivalent experience
LOCATION:
Santa Clara, CA
#LI-BW1
#LI-hybrid
At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD's Employee Stock Purchase Plan. You'll also be eligible for competitive benefits described in more detail here .
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE TEAM:
AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.
THE ROLE:
The DC GPU Fellow - AI Enablement - Infrastructure is a leadership position designed to optimize the at-scale debug, deployment, and operational capabilities of our Instinct based CPU and GPU systems within datacenter environments. Leveraging extensive experience in network architecture, Storage, AI/ML network deployments, Open Source and Custom Models, and performance tuning, this role requires a disciplined approach to system triage, at-scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at-scale datacenter deployment.
THE PERSON:
This position is for a DC GPU Fellow - AI Enablement- Infrastructure, with a focus on architecture, design, optimizing the compute, network, and storage and benchmarking the Machine Learning applications. You will be part of a team closely work with strategic customers and partners to enable large scale deployment of AMD CPU and GPU platforms. You will closely interface with ROCm software developers, DC GPU HW/FW/ASIC Teams, Field Engineering Teams, OEM/ODM partners, CSPs, and Marketing/Business Development teams.
You will be part of a world class team of highly qualified computational scientists/engineers enabling applications for AI/ML, HPC across industry, academia, cloud service providers and national laboratories. Must be self-motivated and possess the ability to work well within a team environment.
KEY RESPONSIBILITIES:
- Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models.
- Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability.
- Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads.
- Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations.
- Provide second-level support and maintenance for ROCm and its integration with third-party tools across AI and HPC ecosystems.
- Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins.
- Benchmark a variety of machine learning based applications for AMD CPU and GPU systems
- Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement.
- Engage with AMD product groups to drive resolution of application and customer issues
- Develop and present training materials to internal audiences, at customer venues, and at industry conferences
PREFERRED EXPERIENCE:
- Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements.
- Demonstrated leadership in network architecture, including extensive hands-on experience with NVMe-oF, NVMe-TCP, InfiniBand, RoCEv2, storage architecture, SPDK/DPDK, Offloading, and complex AI/ML ecosystems.
- Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends.
- Extensive system, storage, and Network and software development/deployment expertise and proven track record of delivering large projects on time.
- Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market.
- Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc.
- Direct experience in working with large customers Strong customer relationship and communication skills.
- Proficient in Linux, Python, or Ansible.
- Working experience with distributed pre-training, fine-tuning and inference.
- Broad experience creating, adapting, and running workloads with widely used AI applications.
- Strong system level performance analysis skills for both CPU and GPU
- Excellent communication level from engineer to mid-management to C-level of audience.
- Thought Leader, backed with Patents, Publications, Participations in Industry, Technical conferences.
- In-depth HPC, AI/ML Experience
- Experience in working with large customers such as Cloud Service Providers and global customers
- Ability to work well in a geographically dispersed team.
- Certifications in Networking, Storage, AI/ML, or Cloud Technologies
ACADEMIC CREDENTIALS:
Bachelors, Masters or PhD in Computer Science , Engineering or related subjects, or equivalent experience
LOCATION:
Santa Clara, CA
#LI-BW1
#LI-hybrid
At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD's Employee Stock Purchase Plan. You'll also be eligible for competitive benefits described in more detail here .
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.