Back to portfolio

NHS South Yorkshire ICB

Production RAG System & Agentic LLM Orchestration (Greenfield)

Production RAG Azure AI Agent Hybrid Search Bing Grounding MCP Integration Azure Functions

Role

Gen AI Developer & Data Architecture Specialist — designed, built, deployed, and operated an end-to-end citizen-facing service discovery assistant with grounded, measurable performance.

Impact (measured)

Monthly active users
3,000+
Query success rate
94%
User satisfaction
4.8/5
Guideline alignment
90%
Uptime
99.7%
Daily records processed
15,000+
Pipeline latency
<5s
Time-to-resolution
12m → <30s

Notes: “0 reported hallucination incidents” refers to production telemetry and issue tracking during the measured period.

Problem

  • Citizens needed fast, reliable discovery of local health services and commissioned care across fragmented sources.
  • Static datasets went stale; new guidance needed to be reflected quickly without manual re-index cycles.
  • Accuracy requirements were high (clinical guidance alignment) with strict constraints on hallucinations.

Solution

I engineered a production-grade Retrieval Augmented Generation (RAG) system using Azure AI Agent SDK with LangChain custom retrieval chains, LlamaIndex input sanitization/enhancement, automated critique loops, and multi-source grounding across Bing Search and a proprietary NHS knowledge base. The system combined hybrid retrieval (lexical + semantic + web) with tool orchestration and validation steps to deliver grounded answers with measurable quality.

Architecture

Architecture diagram showing ingestion, enrichment, hybrid search indexing, multi-source retrieval with Bing grounding, LLM agent orchestration, delivery via web app and MCP server, and observability.

Key engineering highlights

  • Built a production Azure AI Agent with real-time web grounding (Bing Search API), reducing outdated information issues by 95% and enabling responses to emerging guidance within 24 hours of publication.
  • Implemented Azure AI Search hybrid retrieval with 768-dimensional embeddings (OpenAI Ada-003), improving service discovery accuracy by 85% and reducing query resolution time from ~12 minutes to under 30 seconds.
  • Developed agentic orchestration pipelines (tool use + validation) including automated enrichment (auto-tagging, extractive/abstractive summarization, ML metadata augmentation) improving data completeness by 73% and search relevance by 61%.
  • Designed an event-driven Azure Functions pipeline with queue triggers and automatic failover achieving 99.7% uptime while processing 15,000+ daily records with sub-5-second latency.
  • Architected fault-tolerant web scraping with Crawl4AI and custom middleware, extracting structured data from 50+ regional websites supporting a 1.4M population across South Yorkshire.

Integration: Patient-facing access via MCP

I developed a Model Context Protocol (MCP) server to expose the RAG chatbot directly inside the NHS South Yorkshire GP app, allowing patients to query local services and real-time guidance without leaving the existing care interface—while preserving the same grounded retrieval and validation pipeline.

Tech stack

AI / Retrieval

Azure AI Agent SDK, Azure OpenAI (GPT-4o mini), LangChain, LlamaIndex, Azure AI Search, Bing Search API, Function Calling

Application

Python, Quart (async Flask), React (Fluent UI), TypeScript, REST APIs

Data / Platform

Azure Functions, Azure Durable Functions, Cosmos DB (conversation history/memory), Azure CI/CD, event-driven microservices

Want this outcome for your organisation?