NHS South Yorkshire ICB Case Study

NHS South Yorkshire ICB

Production RAG System & Agentic LLM Orchestration (Greenfield)

Production RAG Azure AI Agent Hybrid Search Bing Grounding MCP Integration Azure Functions

Role

Gen AI Developer & Data Architecture Specialist — designed, built, deployed, and operated an end-to-end citizen-facing service discovery assistant with grounded, measurable performance.

Impact (measured)

Monthly active users

3,000+

Query success rate

94%

User satisfaction

4.8/5

Guideline alignment

90%

Uptime

99.7%

Daily records processed

15,000+

Pipeline latency

<5s

Time-to-resolution

12m → <30s

Notes: “0 reported hallucination incidents” refers to production telemetry and issue tracking during the measured period.

Problem

Citizens needed fast, reliable discovery of local health services and commissioned care across fragmented sources.
Static datasets went stale; new guidance needed to be reflected quickly without manual re-index cycles.
Accuracy requirements were high (clinical guidance alignment) with strict constraints on hallucinations.

Solution

I engineered a production-grade Retrieval Augmented Generation (RAG) system using Azure AI Agent SDK with LangChain custom retrieval chains, LlamaIndex input sanitization/enhancement, automated critique loops, and multi-source grounding across Bing Search and a proprietary NHS knowledge base. The system combined hybrid retrieval (lexical + semantic + web) with tool orchestration and validation steps to deliver grounded answers with measurable quality.

Architecture

Key engineering highlights

Built a production Azure AI Agent with real-time web grounding (Bing Search API), reducing outdated information issues by 95% and enabling responses to emerging guidance within 24 hours of publication.
Implemented Azure AI Search hybrid retrieval with 768-dimensional embeddings (OpenAI Ada-003), improving service discovery accuracy by 85% and reducing query resolution time from ~12 minutes to under 30 seconds.
Developed agentic orchestration pipelines (tool use + validation) including automated enrichment (auto-tagging, extractive/abstractive summarization, ML metadata augmentation) improving data completeness by 73% and search relevance by 61%.
Designed an event-driven Azure Functions pipeline with queue triggers and automatic failover achieving 99.7% uptime while processing 15,000+ daily records with sub-5-second latency.
Architected fault-tolerant web scraping with Crawl4AI and custom middleware, extracting structured data from 50+ regional websites supporting a 1.4M population across South Yorkshire.

Integration: Patient-facing access via MCP

I developed a Model Context Protocol (MCP) server to expose the RAG chatbot directly inside the NHS South Yorkshire GP app, allowing patients to query local services and real-time guidance without leaving the existing care interface—while preserving the same grounded retrieval and validation pipeline.

Tech stack

AI / Retrieval

Azure AI Agent SDK, Azure OpenAI (GPT-4o mini), LangChain, LlamaIndex, Azure AI Search, Bing Search API, Function Calling

Application

Python, Quart (async Flask), React (Fluent UI), TypeScript, REST APIs

Data / Platform

Azure Functions, Azure Durable Functions, Cosmos DB (conversation history/memory), Azure CI/CD, event-driven microservices

Want this outcome for your organisation?

Contact me Download CV