Doctify
Medical practice management SaaS with AI-powered document analysis. Built RAG with pgvector, multi-tenant architecture, and document processing pipeline for doctors in Morocco.
A medical practice management platform for doctors in Morocco. Patient records, consultations, prescriptions, billing -- plus an AI layer that can analyze documents, answer questions from patient history, and generate reports. The constraint that made this interesting: medical AI cannot hallucinate. If a doctor asks "what medications is this patient on?" the system has to ground its answer in actual records, not guess. I worked as full-stack developer, owned the RAG system and document processing pipeline, and designed the multi-tenant data model.
RAG with pgvector
The retrieval system uses PostgreSQL with pgvector for native vector similarity search. Documents are chunked with RecursiveCharacterTextSplitter, embedded via OpenAI's text-embedding-3-small (1536 dimensions), and stored alongside the source text. Queries are embedded and matched using cosine similarity (1 - (embedding <=> query_embedding)).
What made this harder than a standard RAG setup is the multi-source retrieval with priority ordering. When a doctor asks a question, chunks come from five sources in priority order: explicitly selected documents (forced inclusion, no similarity filter), conversation documents, consultation history, patient documents, and the cabinet's knowledge base. Each source has different similarity thresholds and allocation rules. When a doctor explicitly selects a knowledge base document, 70% of the remaining chunk budget goes to KB results. This prevents irrelevant patient data from drowning out the document they actually care about.
Document processing
Upload kicks off a pipeline: format-specific text extraction (pdf-parse for PDFs, mammoth for DOCX, exceljs for spreadsheets, Gemini Vision OCR for images), then sanitization (NULL bytes, control characters), chunking, batch embedding generation, and storage. Each document tracks processing status -- PENDING, INDEXED, FAILED -- so the UI can show progress. When a document is updated, it gets re-chunked and re-embedded. The separators (\n\n, \n, . , , , ) handle French medical text reasonably well, with token estimation at roughly 4 characters per token for French.
AI chat streaming
The chat uses SSE with a specific event protocol: start (with message ID), context (chunk metadata so the UI can show sources), token (streamed from the LLM), complete, error. The trick is the message stub pattern: we create the assistant message record in the database before streaming starts, stream tokens to the client, then update the stub with the full response when done. If streaming fails, we delete the stub. This means the conversation history stays consistent whether or not the stream completes.
There's an NLU layer for patient context detection. ChatNLUService scans the user's message for patient names using token overlap and substring matching, but it only injects patient context if there's exactly one high-confidence match (threshold 0.4). Ambiguous matches get ignored rather than guessed at -- better to miss context than inject the wrong patient's data into a medical response.
Both OpenAI and Gemini are supported as LLM backends. Specialties get their own system prompts with domain-specific guidelines. Bilingual (French/English) based on query language detection.
Multi-tenant isolation
Every practice (cabinet) is fully isolated. All database queries are scoped by cabinetId. RAG retrieval enforces cabinet boundaries at query time -- a doctor can only search their own cabinet's documents and patients. The schema has three roles (owner, doctor, secretary) with different access levels. Cabinet activation goes through a payment proof workflow designed for Morocco's bank transfer system.
There are 30+ Prisma models with soft deletes and audit trails throughout. Supersession tracking handles message regeneration -- when a doctor asks the AI to retry, the old response is marked as superseded rather than deleted.
Tech stack
TypeScript, NestJS, React 19, Vite, PostgreSQL + pgvector, Prisma, LangChain.js, OpenAI API, Google Gemini, AWS S3, Resend, Docker, Tailwind CSS, Zustand, TanStack Query, PostHog, Pino.