Remote Okay (Candidate should be local to Austin, TX or Sunnyvale, CA)
Responsibilities:
- Upgrade EKS/Kubernetes for multi-tenancy: right-sized nodes, autoscaling, pod security, network policies, and per-tenant namespaces for strong isolation and cost efficiency.
- Evolve Java/Python ML Services & SDK to a multi-tenant runtime with per-tenant throttles/quotas and efficient resource sharing.
- Productionize: lean CI/CD, automated rollouts/rollbacks, golden-signal observability, and SLOs tuned for multi-tenant workloads.
- Drive SDK adoption with customers; simplify migration and enforce isolation guardrails during cutover.
- Ship tenant-scoped metrics, evaluation, experimentation, and usage metering for performance, drift, and cost attribution.
- Implement backend aggregations (Python/Spark) optimized for multi-tenant analytics and billing.