AI Model Architecture Optimization Engineer R&D
Company: OpenInfer
Location: San Mateo
Posted on: May 3, 2025
Job Description:
San Mateo, CAFull-timePosition OverviewWe are looking for an
experienced AI Acceleration Engineer who can dive deep into large
model (eg. transformer) architectures and blocks such as
self/cross/multi-attention, and perform research and development of
advanced techniques to accelerate these areas. The ideal candidate
will have a deep understanding of large model design, AI
acceleration techniques, and will integrate these advancements into
the PyTorch stack. Familiarity with Python is essential, and
experience with CUDA programming is highly desirable.Key
Responsibilities
- Innovate on AI model components, such as attention blocks,
KV-cache strategies, layer streaming, tokenization, layer norms,
and more, to improve AI model performance and scalability.
- Optimize and integrate AI acceleration techniques into the
PyTorch stack, enabling efficient use across diverse hardware
platforms.
- Own & drive features end to end to push the limits of large
model architecture, ensuring seamless integration with existing
frameworks.
- Benchmark and profile AI models to evaluate performance
improvements, ensuring optimal execution on target hardware.
- Write and maintain clean, efficient code in Python, with a
focus on integration with PyTorch.
- Leverage CUDA for GPU-based acceleration when necessary,
optimizing the attention blocks for maximum performance.
- Work on cross-functional teams to design, implement, and test
new features.Qualifications
- Extensive experience with large AI model architectures,
particularly with attention blocks and transformer models.
- Proficiency in Python and hands-on experience with the PyTorch
framework.
- Strong understanding of AI acceleration techniques and their
application in real-world use cases.
- Familiarity with CUDA for GPU programming is highly
desirable.
- Demonstrated ability to optimize complex models for performance
across different hardware environments.
- Experience in developing and deploying AI models at scale is a
plus.What You'll Gain
- Opportunity to work alongside industry experts in AI
optimization, high-performance computing, and hardware
acceleration.
- Hands-on experience with cutting-edge technologies at the
intersection of AI and hardware acceleration.
- Exposure to open-source development and collaboration with a
vibrant community.Benefits We Offer:At OpenInfer we offer
comprehensive benefits, some include:
- Medical, Dental, and Vision benefits for you and your
family
- Flexible Paid Time Off, 10 days
- 401(k) Plan with company matching
- Snacks and coffee to keep you energizedThese benefits are
further detailed in OpenInfer policies and are subject to change at
any time, consistent with the terms of any applicable compensation
or benefits plans.How to ApplyPlease send your resume and a brief
cover letter to recruiting@openinfer.io. Include examples of your
work with large AI models, attention blocks, or open-source
contributions where applicable.
#J-18808-Ljbffr
Keywords: OpenInfer, Vallejo , AI Model Architecture Optimization Engineer R&D, Engineering , San Mateo, California
Didn't find what you're looking for? Search again!
Loading more jobs...