Candidate Assessment Scorecard
LLM Application Engineer · Submitted for Series B AI startup, Bangalore
Current role
SDE3, mid-size product co.
Experience
6 years (2 in AI/ML)
Notice period
60 days, buyout possible
CTC expectation
Within stated band
Overall: Strong Yes
Good LLM application engineer who has shipped to real users. Built a customer-facing RAG system at ~200k queries/day. Talks about retrieval quality the way someone who has debugged it does. Fine-tuning is a weak spot but the role does not need it. Worth interviewing.
1 = weak · 3 = solid · 5 = outstanding
Written question: Describe a RAG system you shipped. What broke?
Described a document QA system for a legal team. Started with 1500-token chunks, ran into poor precision on short factual questions, switched to smaller semantic chunks plus a keyword index. Hallucination rate went from ~18% to ~4% on their eval set. They had an eval set before we asked about one.
Live question: How would you reduce hallucination without retraining?
First instinct was retrieval quality, not prompt tweaking. Covered re-ranking, query expansion, and confidence thresholds on retrieved chunks. Did not bring up citation grounding on their own but had a clear view on it once asked. Knows this space well enough.
Where they are weaker
Limited fine-tuning experience. All production work has been on hosted APIs, no open-weight model experience. Not relevant for this role. Would matter if you move toward self-hosted inference later.
Why they are looking
Team shifted focus after a product pivot. Last four months have been backend maintenance with no AI work. The CTC jump is real but it is not the only reason they are moving. Knows what joining at this stage means.
What they want
Wants to own a product-facing AI feature start to finish. Asked about how the team measures model quality in production before we got to that topic. That is usually a good sign.
Prepared by Caliber · hello@caliberhq.ai · caliberhq.ai
This is a sample document for illustrative purposes