NHS South Yorkshire ICB
Production RAG System & Agentic LLM Orchestration (Greenfield)
Role
Gen AI Developer & Data Architecture Specialist — designed, built, deployed, and operated an end-to-end citizen-facing service discovery assistant with grounded, measurable performance.
Impact (measured)
Notes: “0 reported hallucination incidents” refers to production telemetry and issue tracking during the measured period.
Problem
- Citizens needed fast, reliable discovery of local health services and commissioned care across fragmented sources.
- Static datasets went stale; new guidance needed to be reflected quickly without manual re-index cycles.
- Accuracy requirements were high (clinical guidance alignment) with strict constraints on hallucinations.
Solution
I engineered a production-grade Retrieval Augmented Generation (RAG) system using Azure AI Agent SDK with LangChain custom retrieval chains, LlamaIndex input sanitization/enhancement, automated critique loops, and multi-source grounding across Bing Search and a proprietary NHS knowledge base. The system combined hybrid retrieval (lexical + semantic + web) with tool orchestration and validation steps to deliver grounded answers with measurable quality.
Architecture
Key engineering highlights
- Built a production Azure AI Agent with real-time web grounding (Bing Search API), reducing outdated information issues by 95% and enabling responses to emerging guidance within 24 hours of publication.
- Implemented Azure AI Search hybrid retrieval with 768-dimensional embeddings (OpenAI Ada-003), improving service discovery accuracy by 85% and reducing query resolution time from ~12 minutes to under 30 seconds.
- Developed agentic orchestration pipelines (tool use + validation) including automated enrichment (auto-tagging, extractive/abstractive summarization, ML metadata augmentation) improving data completeness by 73% and search relevance by 61%.
- Designed an event-driven Azure Functions pipeline with queue triggers and automatic failover achieving 99.7% uptime while processing 15,000+ daily records with sub-5-second latency.
- Architected fault-tolerant web scraping with Crawl4AI and custom middleware, extracting structured data from 50+ regional websites supporting a 1.4M population across South Yorkshire.
Integration: Patient-facing access via MCP
I developed a Model Context Protocol (MCP) server to expose the RAG chatbot directly inside the NHS South Yorkshire GP app, allowing patients to query local services and real-time guidance without leaving the existing care interface—while preserving the same grounded retrieval and validation pipeline.
Tech stack
AI / Retrieval
Azure AI Agent SDK, Azure OpenAI (GPT-4o mini), LangChain, LlamaIndex, Azure AI Search, Bing Search API, Function Calling
Application
Python, Quart (async Flask), React (Fluent UI), TypeScript, REST APIs
Data / Platform
Azure Functions, Azure Durable Functions, Cosmos DB (conversation history/memory), Azure CI/CD, event-driven microservices