Research Engineer, Post Training RL Job at TensorStax, Sunnyvale, CA

WlBFWEIzdlBabTlTOG0rdmpnNUgrZGlDaFE9PQ==
  • TensorStax
  • Sunnyvale, CA

Job Description

Research Engineer – Post Training Reinforcement Learning

Location: San Francisco (Hybrid)

About TensorStax

TensorStax is building fully autonomous AI systems to manage and maintain mission-critical data infrastructure and pipelines. We leverage reinforcement learning to enhance language models' ability to reason over large-scale data lakes and warehouses, detect pipeline failures, construct new pipelines with high precision, and enable agentic behavior—allowing systems to proactively identify and resolve issues autonomously.

What You’ll Do

As a Research Engineer specializing in Reinforcement Learning, you will:

  • Develop and refine reward functions to optimize agent behavior for complex data engineering tasks.
  • Create RL gym environments for language model agents.
  • Fine-tune language models using reinforcement learning techniques such as PPO, DPO, and KTO.
  • Stay at the forefront of research on RL for language models, incorporating advancements like GRPO, SWE-Gym, and SWE-RL into practical applications.
  • Curate and build high-quality datasets for supervised fine-tuning (SFT) and RLHF.
  • Design experiments to evaluate and improve the agentic capabilities of language models in data environments.

What We’re Looking For

  • Deep understanding of reinforcement learning, reward shaping, and optimization strategies.
  • Strong familiarity with LLM fine-tuning techniques (PPO, DPO, KTO) and their applications in reinforcement learning.
  • Knowledge of recent advancements in RL for language models (GRPO, SWE-Gym, SWE-RL).
  • Experience curating and constructing high-quality datasets for fine-tuning.
  • Strong problem-solving skills and a history of working on complex ML projects.
  • High agency—ability to work independently, experiment proactively, and drive research initiatives forward.

Bonus Points

  • Experience with distributed training in PyTorch (DDP, FSDP).
  • Hands-on experience designing RL environments for traditional RL problems.
  • Contributions to open-source projects in RL, LLMs, or ML infrastructure.
  • Familiarity with data lakes and warehouses (Snowflake, BigQuery, Redshift).

Benefits

  • 100% employer-covered health, dental, and vision insurance.
  • 401(k) with company match.
  • Access to Bay Club or Equinox in San Francisco.

Job Tags

Temporary work,

Similar Jobs

Net2Source Inc.

Assistant Chemist Job at Net2Source Inc.

Net2Source is a hiring for below contract role for our fortune client. Position: Chemist - II (Associate) Location: Wilson, NC Duration: 12+ Months Contract Job Description: Release team Chemist- Main function is Coulometric KF, Wet chemistry, FTIR, UV...

Legacy Health

Urgent Care Physician Assistant - Float Job at Legacy Health

 ...Legacy to make their lives better. They seek care that is as compassionate as it is...  ...and surgical services to patients of the Urgent Care Clinic(s). Ability to make responsible...  ...throughout any single workday; therefore, travel between locations is also anticipated.... 

Solomon Page

Visual Communications Coordinator Job at Solomon Page

 ...We are looking for a Visual Communications Coordinator for a top fashion company in Columbus, OH! Responsibilities: Responsible for...  ...program for hourly employees. We pride ourselves on offering medical, dental, 401(k), direct deposit and commuter benefits to our... 

Retell AI

Founding Senior Software Engineer (Full-stack) Job at Retell AI

 ...work from home Salary: $200K - $310K Equity: 0.50% - 1.25% Bonus: $20k - $100k Location: Redwood City, CA, US US visas: Sponsors Visa & Greencard Benefits: 100% medical, dental, vision insurance covereage for you. Unlimited breakfast, lunch, dinner... 

CEVA Logistics

Entry Writer Job at CEVA Logistics

 ...years related experience. Professional certification may be required in some areas. Preferred: ~ AssociatesDegree. Travel: None CEVA operates in a multicultural, global environment and is a richly diverse organization operating seamlessly as one...