← Back to Caliber
Caliber
SAMPLE · FOR ILLUSTRATION ONLY

Candidate Assessment Scorecard

Candidate A

LLM Application Engineer · Submitted for Series B AI startup, Bangalore

Current role

SDE3, mid-size product co.

Experience

6 years (2 in AI/ML)

Notice period

60 days, buyout possible

CTC expectation

Within stated band

Overall: Strong Yes

Good LLM application engineer who has shipped to real users. Built a customer-facing RAG system at ~200k queries/day. Talks about retrieval quality the way someone who has debugged it does. Fine-tuning is a weak spot but the role does not need it. Worth interviewing.

Assessment scores

Technical depth
4/5
System design thinking
4/5
Communication clarity
5/5
Production experience
4/5
Motivation fit
3/5

1 = weak · 3 = solid · 5 = outstanding

Technical assessment

Written question: Describe a RAG system you shipped. What broke?

Described a document QA system for a legal team. Started with 1500-token chunks, ran into poor precision on short factual questions, switched to smaller semantic chunks plus a keyword index. Hallucination rate went from ~18% to ~4% on their eval set. They had an eval set before we asked about one.

Live question: How would you reduce hallucination without retraining?

First instinct was retrieval quality, not prompt tweaking. Covered re-ranking, query expansion, and confidence thresholds on retrieved chunks. Did not bring up citation grounding on their own but had a clear view on it once asked. Knows this space well enough.

Where they are weaker

Limited fine-tuning experience. All production work has been on hosted APIs, no open-weight model experience. Not relevant for this role. Would matter if you move toward self-hosted inference later.

Motivation and fit

Why they are looking

Team shifted focus after a product pivot. Last four months have been backend maintenance with no AI work. The CTC jump is real but it is not the only reason they are moving. Knows what joining at this stage means.

What they want

Wants to own a product-facing AI feature start to finish. Asked about how the team measures model quality in production before we got to that topic. That is usually a good sign.

Suggested interview focus

  • Push on their eval setup. They have one. Find out how they built it.
  • Ask about a feature that underperformed after shipping and how they tracked it down.
  • Ask what they would do if you needed to run models locally. See if they have thought about it.

Prepared by Caliber · hello@caliberhq.ai · caliberhq.ai

This is a sample document for illustrative purposes