Software Engineer, Data Platform
Company: Middesk
Location: San Francisco
Posted on: April 1, 2026
|
|
|
Job Description:
About Middesk Middesk makes it easier for businesses to work
together. Since 2018, we’ve been transforming business identity
verification, replacing slow, manual processes with seamless access
to complete, up-to-date data. Our platform helps companies across
industries confidently verify business identities, onboard
customers faster, and reduce risk at every stage of the customer
lifecycle. Middesk came out of Y Combinator, is backed by Sequoia
Capital and Accel Partners, and was recently named to Forbes
Fintech 50 List. About Middesk Engineering: Middesk is, at its
core, a data company. We live by the quality of our data assets and
the engine that powers them. We’re on a mission to build a
comprehensive and complete business dataset for every business in
the world. As part of the Data Platform team at Middesk, you’ll
collaborate with Data Science, Infrastructure and Product
Engineering teams to build and maintain our own proprietary Entity
Resolution system used to power the Middesk business identity
platform, scaling our system to resolve millions of business
identities across hundreds of data sources and thousands of
distinct data sets. You’ll often work with and support product
engineers looking to launch new products and features. The Role:
We're looking for a senior engineer to own and drive the technical
direction of our data platform. You'll design data infrastructure
systems that operate reliably at scale, define the architecture for
how we acquire, transform, and serve data to product teams, and
partner closely with engineering leadership to align platform
strategy with company priorities. You'll be expected to
independently scope complex projects from ambiguous problem
statements, break them into incremental deliverables, and drive
them to completion while keeping stakeholders informed and adapting
when priorities shift. You'll also play a key role in elevating the
engineering practices of the team through mentorship, code review,
and setting technical standards. What You'll Do: Own platform
architecture and technical direction for how we ingest, transform,
and serve data across highly variable input formats - business
entity data sourced from thousands of government agencies,
registries, and third-party providers, each with its own schema,
cadence, and reliability profile Design and build systems for scale
- both the infrastructure we need today and the infrastructure
we'll need at 2–5x our current volume Scope and drive complex
projects end-to-end , breaking ambiguous problems into well-defined
milestones with clear deliverables and timelines Design AI-powered
tooling to improve how we acquire and maintain data using LLMs, AI
agents, and agent orchestration Partner with product engineering,
data science, and business teams to understand data needs and
translate them into platform capabilities Establish and maintain
data governance and quality standards across the platform, ensuring
the integrity and reliability of the data our customers depend on
for compliance and risk decisions What We’re Looking For: 7 years
of professional software engineering experience , with meaningful
time spent on data infrastructure, data engineering, or backend
platform work (targeting Senior to Staff Level Engineers)
Experience designing and operating systems at meaningful scale ,
ideally within a larger or rapidly scaling engineering organization
Track record of independently owning and delivering complex,
multi-milestone projects - from scoping through launch Strong data
modeling instincts and deep familiarity with SQL, pipeline
orchestration (Airflow, Dagster, etc.), and data transformation
patterns Experience with distributed data processing frameworks
(Spark, Flink, Beam, or similar) and an understanding of when and
how to apply parallelization to scale pipelines beyond single-node
limits Proficiency in one or more of : Python, Ruby,
JavaScript/TypeScript, Java Nice to Haves: Experience working with
or building AI/LLM-powered tooling or data products (strongly
preferred) Experience with scraper technologies, including agentic
AI Experience building and designing collections stored on
Elasticsearch Experience operating event-driven data pipelines
using serverless compute (e.g., AWS Lambda, Google Cloud Functions,
Azure Functions) and managed cloud services Experience with
Terraform, Datadog, or Kubernetes
Keywords: Middesk, Vallejo , Software Engineer, Data Platform, IT / Software / Systems , San Francisco, California