The Challenge
Recruiters were overwhelmed. Each open role attracted hundreds of CVs, and the team was spending hours — sometimes days — just trying to produce a shortlist. The process was inconsistent: one recruiter focused on keywords, another on company names, another on job titles. Strong candidates slipped through the cracks simply because they didn’t phrase their skills the “expected” way.
There was also a deeper concern: bias. Managers noticed that shortlists often lacked diversity, and there was no clear way to explain why some candidates were chosen over others. As one hiring manager said: “It feels like luck of the draw — I don’t know if we’re seeing the best people or just the ones who use the right words.”
The pressure was mounting: long time-to-hire, missed talent, and a lack of trust in the process.
Our Approach
We knew speed alone wouldn’t solve the problem — the solution had to be fast, fair, and explainable. We designed an AI-driven CV screening pipeline that combined automation with recruiter control.
- Parsing and normalisation: We used advanced NLP (spaCy with custom models) to parse PDFs, DOCs, and DOCXs, extracting structured data on skills, education, companies, and achievements.
- Skills graph and semantic matching: Instead of keywords, we built a skills graph (using ESCO/ONET plus client-specific taxonomies). CVs and role descriptions were converted into semantic embeddings and matched using FAISS/pgvector. This meant the system understood that “FP&A” and “financial planning” were the same.
- Hybrid scoring: We weighted three dimensions — semantic fit, evidence of real achievements, and practical constraints (location, notice period, compensation). This ensured candidates weren’t just technically strong, but genuinely aligned with the role.
- Fairness and transparency: We introduced optional redaction (hiding name, gender, photo, or graduation year) to reduce bias. Every shortlist came with plain-English justifications, e.g., “Ranked high for Power BI, NetSuite, and AP automation; 4 years domain experience.”
- Integration and feedback loop: The pipeline integrated with ATS systems and messaging tools, allowing recruiters to tweak scoring weights or thumbs-up/down candidates. The model improved over time but never at the cost of fairness controls.
Technology Used
- spaCy + custom NER models for CV parsing.
- Skills ontology (ESCO/ONET + client taxonomies) for structured skill mapping.
- FAISS/pgvector for semantic vector search and role-CV matching.
- Hybrid scoring engine combining semantic similarity, evidence scoring, and constraint satisfaction.
- Bias controls (redaction mode, fairness dashboards, Shapley-style explainability).
- ATS/Slack integrations for recruiter workflows.
The Outcome
- Shortlisting time reduced from a full day to under 30 minutes for 300 CVs.
- Recall of high-fit candidates improved by 28%.
- Bias reduced by up to 35% when redaction mode was enabled.
- Hiring managers rated shortlist quality 4.6/5, finally trusting the process.
One HR leader said it best: “We don’t just get lists faster. We know exactly why someone made the cut — and we can stand behind it.”