Skip to main content
Guides Company playbooks The DoorDash Data Scientist Interview — Marketplace Experiments and Three-Sided Metrics
Company playbooks

The DoorDash Data Scientist Interview — Marketplace Experiments and Three-Sided Metrics

9 min read · April 25, 2026

DoorDash's DS loop is harder than most candidates expect because the experiments round isn't abstract — it's about running A/B tests on a three-sided marketplace where every treatment has spillover. Here's the playbook.

DoorDash's data scientist loop in 2026 is one of the more difficult DS loops in tech, not because the individual rounds are harder than Meta's or Netflix's, but because every round tests how comfortable you are reasoning about a three-sided marketplace where treatments on one side spill over to the others. If you've only run experiments on a feed product, the DoorDash loop will expose that quickly.

This guide is the 2026 breakdown of the DS track: round structure, the SQL bar, the experimentation round (this is the big one), the product / metrics round, and the ML / modeling round for applied DS roles.

Track structure: analytics DS, product DS, algorithms DS

DoorDash's DS org splits into three functional tracks and the loop varies by track:

  • Analytics DS — embedded with product and ops teams. Heavy SQL, heavy experimentation, light modeling. This is the largest track.
  • Product DS — similar to analytics but more forward-looking: forecasting, strategy, roadmap shaping. A/B experimentation is still load-bearing.
  • Algorithms DS (also called ML DS) — applied ML roles embedded with dispatch, pricing, search, and fraud. Heavier modeling, offline evaluation, and production ML.

Your recruiter will tell you which track. Ask explicitly if unclear — the prep split changes materially.

The loop

Typical 2026 DoorDash DS onsite:

  1. Recruiter screen (30 min) — standard.
  2. Technical phone screen (60 min) — SQL (2-3 problems) + one case-style analytical prompt. This is a real filter; roughly half of candidates don't clear.
  3. Onsite SQL round (60 min) — 3-5 harder SQL problems. Window functions, cohorts, rolling windows, self-joins.
  4. Experimentation round (60 min) — design an A/B test, or diagnose a messed-up one, on a marketplace surface. This is the load-bearing round.
  5. Product / metrics round (45-60 min) — an open-ended product scenario. Define success metrics, diagnose a drop, propose a new feature and how you'd measure it.
  6. Modeling round (60 min, algorithms DS only) — design an ML system (ETA, dispatch, fraud, pricing). Feature engineering, offline eval, online eval, failure modes.
  7. Behavioral / values (45-60 min) — STAR-style stories keyed to DoorDash's five values, plus at least one cross-functional scenario.

For senior DS (E5+) and manager roles, an additional round with a VP or a cross-functional partner (usually a PM or an eng lead) is added.

The SQL bar: higher than you think

DoorDash's DS SQL round is harder than the SWE SQL round. Expect five problems in 60 minutes on a realistic marketplace schema. Typical problem types:

  1. Retention / cohort — "Define a Dasher as retained in week N if they did at least 3 deliveries in that week. For Dashers whose first delivery was in Jan 2026, compute weekly retention for weeks 1-12."
  2. Funnel — "For each step in the consumer checkout funnel (landing → menu view → cart → checkout → order placed → delivered), compute conversion. Segment by new vs returning consumer."
  3. Rolling / time-series — "For each market, compute the 7-day rolling median delivery time for the last 90 days."
  4. Marketplace matching — "Compute Dasher idle time per shift, where idle time is the minutes between deliveries during an active shift."
  5. Anomaly / outlier detection — "Flag markets where acceptance rate deviates more than 2 SD from the trailing 30-day mean."

The bar is: correct SQL using CTEs, window functions (LAG, LEAD, ROW_NUMBER, SUM OVER), proper handling of NULLs and ties, and awareness of edge cases (what about Dashers with only one delivery; what about markets with fewer than 30 days of data). Interviewers read your SQL aloud and push back on any part where the logic is unclear. Brief, commented CTEs crush dense nested subqueries every time.

Practice on StrataScratch's marketplace filter, DataLemur's DoorDash set, and the SQL chapter of Ace the Data Science Interview. Budget 20-30 hours if you're rusty.

The experimentation round: marketplace spillovers

This is the round that decides most DoorDash DS offers. The prompt is either "design an A/B test for [feature]" or "a colleague ran this test, the results look like [X], what do you think?" On a marketplace, neither is simple.

What strong answers cover:

1. Unit of randomization. Consumer? Dasher? Order? Market? For a consumer-facing feature (a new checkout UI), you'd randomize on consumer_id. For a Dasher-facing feature (a new batch-order UI), you'd randomize on Dasher. For pricing or dispatch changes, you'd usually randomize on market — because consumer-level randomization creates interference (treated consumers compete with control consumers for the same Dashers).

2. Interference / SUTVA violations. DoorDash is a constrained-supply marketplace. If you run a consumer-level test that boosts order volume in treatment, treated consumers pull Dashers away from control consumers, inflating ETAs for control. Your treatment effect is biased upward. Know the fix: switchback tests (alternate treatment/control by hour or day within a market), cluster randomization (randomize at the market level), or synthetic control.

3. Sample size and power. DoorDash effect sizes are small — 1-3% on delivery time, 0.5-2% on conversion. You need to know how to compute MDE given variance and traffic, and how long the test needs to run. "Two weeks" is rarely a correct answer on its own; justify it.

4. Metrics hierarchy. Name your primary metric (e.g., consumer orders per week), guardrail metrics (Dasher income, merchant GMV, cancellation rate), and diagnostic metrics (ETA, acceptance rate). DoorDash's culture explicitly pushes back on primary-metric-only thinking.

5. Heterogeneous effects. "The treatment helped overall but hurt new consumers" is the kind of finding DoorDash interviewers want you to surface. Propose slicing by consumer tenure, market density, order type, and time of day.

6. Launch decision framework. Would you ship? Under what conditions? What would you do if the primary metric moved but a guardrail regressed? DoorDash's ship/no-ship decisions are explicit about tradeoffs — candidates who understand the frame outperform.

A canonical example: "We added a feature that batches two deliveries together for Dashers. Consumer ETA went up by 90 seconds on average. Dasher earnings per hour went up 11%. Merchant prep-to-pickup time went down 4%. Do we launch?" A strong answer considers consumer NPS impact, DX impact, competitive position, and proposes a conditional launch (e.g., enable batching only when predicted batched ETA < threshold).

The product / metrics round

Similar to the SWE product sense round but deeper. Expect one of three flavors:

  • Metric design — "Propose a North Star metric for the Dasher side of the marketplace, and three to five input metrics that drive it."
  • Metric diagnosis — "Consumer DAU in the Chicago market dropped 5% last week. Walk me through how you'd investigate."
  • Feature proposal — "DoorDash is considering launching a membership tier for merchants. How would you evaluate it?"

Strong answers segment aggressively, form rank-ordered hypotheses, name verification plans, and quantify expected impact. Weak answers stay at the framework level and never commit to a specific number or hypothesis. DoorDash interviewers explicitly push back on "it depends" non-answers — be willing to take a position.

The modeling round (algorithms DS track)

For ML DS candidates only. The prompt is usually "design the [ETA / dispatch scoring / fraud / pricing] model." Strong answers cover:

  1. Problem framing — regression vs classification vs ranking. What is the unit of prediction? What is the label? What is the latency budget?
  2. Feature engineering — time-of-day, market, Dasher state, merchant state, historical rolling features, weather, real-time traffic. For ETA specifically: prep-time features from the merchant, routing features from the map service, Dasher assignment features.
  3. Model choice — gradient-boosted trees (LightGBM, XGBoost) for tabular; DoorDash has moved some ETA surfaces to deep learning but GBDTs are still the workhorse. Justify the choice against latency and interpretability needs.
  4. Offline evaluation — MAE for ETA, AUC for fraud, NDCG for ranking. Train/test split by time, not random — to avoid leakage.
  5. Online evaluation — A/B test with a marketplace-safe design (see the experimentation section).
  6. Failure modes — cold-start markets, concept drift (new merchant onboarding), distribution shift (weather events), label delay (a fraud label isn't known for weeks).

For ETA specifically: know that the model predicts total delivery time, and that the components (merchant prep time, Dasher assignment time, pickup time, transit time) are often modeled separately and summed. Candidates who know this detail stand out.

What DoorDash DS interviewers grade on

  • Marketplace fluency. Comfort with three-sided reasoning and supply-demand dynamics.
  • Rigorous experimentation. Know SUTVA, know switchbacks, know power calculations, know guardrails.
  • Business-grounded answers. Tie every analysis to a ship/no-ship decision or a dollar impact.
  • SQL proficiency. Fluent window functions, clean CTEs, fast under pressure.
  • Communication. Narrate before you type, quantify claims, acknowledge uncertainty.
  • Bias for action. In behavioral stories, decisions made with partial information beat waited-for-consensus stories.

What does not score well: framework-only product answers, treating experimentation as "two-sample t-test", hand-waving on statistical power, claiming ML solves problems that structural fixes would solve better.

Comp and leveling

DoorDash DS levels mirror SWE (E3-E7) with TC roughly 85-95% of SWE at the same level. Standard 2026 Tier 1 (SF/NYC) bands:

  • E3 (new grad): $155K-$190K TC
  • E4: $210K-$280K TC
  • E5 (senior): $300K-$420K TC
  • E6 (staff): $430K-$620K TC
  • E7 (principal): $580K-$850K+ TC

Equity structure matches SWE (4-year RSU, 25/25/25/25). Algorithms DS roles on high-leverage teams (dispatch, pricing) often sit at the top of their band or get an extended grant. Negotiable levers: initial equity grant (15-25% with a credible competing offer), sign-on ($20K-$75K), and level.

Prep plan

Allocate 5-7 weeks for a cold prep:

  • Weeks 1-2: SQL drilling on StrataScratch and DataLemur, 40-60 problems.
  • Weeks 2-4: Experimentation deep-dive. Read the DoorDash engineering blog posts on switchback testing and CUPED. Work through Ron Kohavi's Trustworthy Online Controlled Experiments. Drill 15-20 experimentation case prompts.
  • Weeks 3-5: Product sense. Diagnose-a-metric drills and feature evaluation drills. Practice committing to numbers.
  • Weeks 5-6 (algorithms DS only): ML system design drills — ETA, fraud, ranking. Read 4-5 DoorDash ML blog posts and three papers on gradient-boosted ranking.
  • Week 6: Mock interviews. Find an ex-DoorDash DS on interviewing.io or via a coach; generic DS mocks miss the marketplace spillover flavor.

The DoorDash DS loop rewards a specific kind of thinking: three-sided, experiment-rigorous, business-grounded. Candidates who bring generic FAANG DS prep pass the phone screen and stumble on the experimentation round. Candidates who internalize the marketplace lens — SUTVA violations, guardrail metrics, ship/no-ship tradeoffs — tend to get offers.

Sources and further reading

When evaluating any company's interview process, hiring bar, or compensation, cross-reference what you read here against multiple primary sources before making decisions.

  • Levels.fyi — Crowdsourced compensation data with real recent offers across tech employers
  • Glassdoor — Self-reported interviews, salaries, and employee reviews searchable by company
  • Blind by Teamblind — Anonymous discussions about specific companies, often the freshest signal on layoffs, comp, culture, and team-level reputation
  • LinkedIn People Search — Find current employees by company, role, and location for warm-network outreach and informational interviews

These are starting points, not the last word. Combine multiple sources, weight recent data over older, and treat anonymous reports as signal that needs corroboration.