Staff II Software Engineer AI/ML Ops
Company: Blackline
Location: Pleasanton
Posted on: January 8, 2026
|
|
|
Job Description:
At BlackLine, were committed to bringing passion and customer
focus to the business of enterprise applications. Since being
founded in 2001, BlackLine has become a leading provider of cloud
software that automates and controls the entire financial close
process. Our vision is to modernize the finance and accounting
function to enable greater operational effectiveness and agility,
and we are committed to delivering innovative solutions and
services to empower accounting and finance leaders around the world
to achieve Modern Finance. Being a best-in-class SaaS Company, we
understand that bringing in new ideas and innovative technology is
mission critical. At BlackLine we are always working with new,
cutting edge technology that encourages our teams to learn
something new and expand their creativity and technical skillset
that will accelerate their careers. Work, Play and Grow at
BlackLine! Make Your Mark: As a Machine Learning Operations
Engineer, you will play a pivotal role in bridging the gap between
data science and production environments. This position requires a
strong background in machine learning, software engineering, and
operations to ensure the successful deployment, monitoring, and
maintenance of machine learning models. You will collaborate with
cross-functional teams to streamline the machine learning
lifecycle, ensuring seamless integration into operational systems.
RESPONSIBILITIES Youll Get To: Leadership and Strategy • Partner
with data science, security, and product teams to set evaluation
and governance standards (Guardrails, Bias, Drift, Latency SLAs). •
Mentor senior engineers and drive design reviews for ML pipelines,
model registries, and agentic runtime environments. • Lead incident
response and reliability strategies for ML/AI systems. AI System
Deployment and Integration: • Collaborate with development teams to
integrate AI solutions into existing workflows and applications. •
Ensure seamless integration with different platforms and
technologies. • Define and manage MCP Registry for agentic
component onboarding, lifecycle versioning, and dependency
governance. • Build CI/CD pipelines automating LLM agent
deployment, policy validation, and prompt evaluation of workflows.
• Develop and operationalize experimentation frameworks for agent
evaluations, scenario regression, and performance analytics. •
Implement logging, metering, and auditing for agent behavior,
function calls, and compliance alignment. • Create scalable
observability systems—tracking conversation outcomes, factual
accuracy, latency, escalation patterns, and safety events. •
Architect end-to-end guardrails for AI agents including prompt
injection protection, identity-aware routing, and tool usage
authorization. • Collaborate cross-functionally to standardize
authentication, authorization, and session governance for
multi-agent runtimes. Model Deployment and Integration: • Architect
and standardize model registries and feature stores to support
version tracking, lineage, and reproducibility across environments.
• Lead the deployment of machine learning models into production
environments, ensuring scalability, reliability, and efficiency. •
Collaborate with software engineers to integrate machine learning
models into existing applications and systems. • Implement and
maintain APIs for model inference. Infrastructure and Environment
Management: • Design and manage training infrastructure including
distributed training orchestration, GPU/TPU resource allocation,
and automatic scaling. • Implement CI/CD for model workflows using
pipelines integrated with model validation, bias checks, and
rollback automation. • Build standardized experimentation
frameworks for reproducible training, tuning, and deployment cycles
(MLflow, W&B, Kubeflow). • Manage and optimize the
infrastructure required for machine learning operations in cloud. •
Work closely with other teams to ensure the availability, security,
and performance of machine learning systems. Monitoring and
Maintenance: • Implement robust monitoring solutions for deployed
machine learning models to detect issues and ensure performance. •
Collaborate with data scientists and engineers to address and
resolve model performance and data quality issues. • Conduct
regular system maintenance, updates, and optimizations to ensure
optimal performance of machine learning solutions. Automation and
Orchestration: • Develop and maintain automation scripts and tools
for managing machine learning workflows. • Implement orchestration
systems to streamline the end-to-end machine learning lifecycle,
from data preparation to model deployment. Collaboration with Data
Science Teams: • Collaborate with data scientists to understand
model requirements and constraints for deployment. • Facilitate the
transition of machine learning models from research to production,
ensuring scalability and efficiency. Performance Optimization: •
Identify and implement optimizations to enhance the performance and
efficiency of machine learning models in production. • Conduct
performance analysis and implement improvements based on resource
utilization of metrics. Security and Compliance: • Implement
security measures to protect machine learning systems and data. •
Ensure compliance with regulatory requirements and industry
standards related to machine learning and data privacy. • Integrate
audit controls, metadata storage, and lineage tracking across ML
and AI workflows. • Ensure complete monitoring and feedback loops
including event logs, evaluations, and automated retraining
triggers. • Enforce secure deployment patterns with
Infrastructure-as-Code and cloud-native secrets management. •
Define SLAs, error budgets, and compliance reporting mechanisms for
ML and AI systems. What Youll Bring: Knowledge: Typically possesses
extensive practical experience with consistent, demonstrated
success developing effective business solutions/applications for
products or services that may effect broad areas of the org |
Expert solution builder Competencies: Recognized expert within and
outside of the organization Possesses industry expertise as an
individual contributor to operations | Sets objectives and delivers
results that have an impact within the department or division |
Provides advice, counsel and thought leadership within the
department | Influencer/architect/orchestrator High level strategic
influence | Decisions impact business units or departments
strategic direction | Anticipates emerging trends | Accountable to
3 year horizon | Futurist mindset | Expert operator High level of
unprecedented work or experience | High degree of autonomy and
exercises independent discretion | Accountable for complex, highly
strategic duties requiring functional expertise | Develops path
through orgs most ambiguous endeavors | Developer of innovation or
adaptation We’re Even More Excited If You Have: • Education and
Experience: • Bachelor’s or Master’s degree in Computer Science,
Machine Learning, Data Science, or a related field. • Technical
Skills: • Strong programming skills in languages such as Python,
Java, or Scala. • Expertise in ML frameworks (TensorFlow, PyTorch,
scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex
AI, MLflow). • Proven experience operating production pipelines for
ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure).
• Deep familiarity with LangChain, LangGraph, ADK or similar
agentic system runtime management. • Strong competencies in CI/CD,
IaC, and DevSecOps pipelines integrating testing, compliance, and
deployment automation. • Hands-on with observability stacks
(Prometheus, Grafana, Newrelic) for model and agent performance
tracking. • Understanding of governance frameworks for Responsible
AI, auditability, and cost metering across training and inference
workloads. • Proficiency in containerization technologies (e.g.,
Docker, Kubernetes). • Operations and Infrastructure: • Proficient
in scripting languages (e.g., Bash, python) for automation. •
Experience with workflow orchestration tools (e.g., Apache
Airflow). • Expertise in managing and optimizing cloud-based
infrastructure. • Familiarity with DevOps practices and tools for
automated deployment. • Understanding of network configurations and
security protocols. • Problem-solving and Critical Thinking: •
Ability to define problems, collect and analyze data, and propose
innovative solutions. Strong critical thinking skills to evaluate
models, identify limitations, and • Adaptability and Learning
Agility: • Comfortable working in a fast-paced, rapidly evolving
environment. Proactive in staying up to date with the latest
trends, techniques, and technologies in AI/data science Thrive at
BlackLine Because You Are Joining: • A technology-based company
with a sense of adventure and a vision for the future. Every door
at BlackLine is open. Just bring your brains, your problem-solving
skills, and be part of a winning team at the worlds most trusted
name in Finance Automation! • A culture that is kind, open, and
accepting. Its a place where people can embrace what makes them
unique, and the mix of cultural backgrounds and varying interests
cultivates diverse thought and perspectives. • A culture where
BlackLiners continued growth and learning is empowered. BlackLine
offers a wide variety of professional development seminars and
inclusive affinity groups to celebrate and support our diversity.
BlackLine is an equal opportunity employer. All qualified
applicants will receive consideration for employment without regard
to sex, gender identity or expression, race, ethnicity, age,
religious creed, national origin, physical or mental disability,
ancestry, color, marital status, sexual orientation, military or
veteran status, status as a victim of domestic violence, sexual
assault or stalking, medical condition, genetic information, or any
other protected class or category recognized by applicable equal
employment opportunity or other similar laws. BlackLine recognizes
that the ways we work and the workplace itself have shifted. We
innovate in a workplace that optimizes a combination of virtual and
in-person interactions to maximize collaboration and nurture our
culture. Candidates who live within a reasonable commute to one of
our offices will work in the office at least 2 days a week.
Keywords: Blackline, Vallejo , Staff II Software Engineer AI/ML Ops, IT / Software / Systems , Pleasanton, California