Location: SF Bay Area

Type: Full Time

Compensation: Cash + Equity

Vali is transforming the home care industry from the ground up with our agentic OS. We’re hiring a pragmatic ML platform engineer—ideally with tech-lead experience—to stand up our ML stack and own models in production.

What you’ll do

Build the ML platform from scratch: data pipelines, feature/embedding store, model registry, CI/CD for models, evals (offline/online), observability, rollback.
Ship production models/agents for scheduling & matching (availability forecasting, constraints/optimization) and communications (intent/routing, summarization, after-hours voice agent).
Create training/feedback loops from historical interactions; enforce data quality, drift detection, guardrails, and human-in-the-loop review.
Reinforcement learning for LLM agents: reward modeling, offline RL from logged interactions, contextual bandits/A-Bats, RLAIF/RLHF, safe exploration, and policy evaluation.
Define and move the metrics (fill rate, on-time starts, reassign latency, SLA adherence) with tight product/ops collaboration.
[Optional] Lead/mentor a small team; drive roadmap and engineering standards.

What you’ve done

5–8+ years building ML systems in production with ownership of reliability, latency, and business KPIs.
Hands-on with MLOps: Airflow/Prefect, Spark/Ray, Feast/feature stores, MLflow/Kubeflow, vector DBs, Docker/K8s, and a major cloud (GCP/AWS/Azure).
Experience with ranking/matching, forecasting, strong Python and software engineering fundamentals.
LLM/agent know-how (RAG, tool use, orchestration) or a demonstrated ability to ramp quickly.

Nice to have

Workforce scheduling, logistics, contact centers, or healthcare ops background.
HIPAA/PHI handling and healthcare compliance experience.