04-RAG-System

🧠

The RAG system is the agent's memory of the project. It is mandatory before any plan or edit unless the user explicitly turns RAG OFF for a run.

1. What gets indexed

routes (URI, method, name, controller, middleware, params)

controllers and controller methods

models and their relationships

migrations and resulting tables/columns/indexes/FKs

services, jobs, events, listeners, policies, middleware

form requests, API resources

config files

composer.json / package.json / Podfile / Package.swift

README files, all docs/*.md

.cursor/rules/*.mdc

.claude/*.md

known API endpoints (from OpenAPI if present)

previous run summaries (final_summary events)

previous file changes (path, action, summary)

known bugs (from docs/KNOWN_BUGS.md if present)

architectural decisions (ADRs in docs/adr/*.md)

2. Storage shape

See 03 — Database Migrations → rag_chunks.

Key columns:

source_type: route | controller | controller_method | model | model_relation | migration | table | service | job | event | listener | policy | middleware | request | resource | config | package | readme | doc | cursor_rule | claude_rule | api_endpoint | run_summary | file_change | known_bug | adr

metadata_json includes: framework, file path, line range, language, related symbols, related routes, related tables.

content_hash: sha256 of the normalized chunk_text. If the hash matches the stored chunk for a given (project_id, source_path, symbol_name), skip re-embedding.

3. Chunking strategy

Intelligent and language-aware.

Source	Chunk unit	Notes
PHP class	per public method + one class header chunk	Include namespace, class doc, signature.
PHP migration	per `up()` table	Capture column types, indexes, FKs.
Routes	per route group, then per route	One chunk per HTTP route is the most useful.
Markdown docs	per H2/H3 section, with overlap	200-token overlap.
Config	per top-level key	One chunk per `[key => array]` entry.
Swift	per `struct`/`class`/`view`	Use SwiftSyntax-style splits (regex acceptable v1).
JS/TS	per exported symbol	TS-morph or regex-based.

Global limits: chunk text 200–1200 tokens, min overlap 100 tokens for prose, no overlap for code.

4. Embedding pipeline

flowchart LR
	A["changed file"] --> B["Detector picks source_type"]
	B --> C["Chunker emits chunks"]
	C --> D["Normalize + content_hash"]
	D --> E{"hash exists?"}
	E -- yes --> X["skip"]
	E -- no --> F["OpenAI embeddings"]
	F --> G["Upsert rag_chunks row"]
	G --> H["emit rag_chunk_indexed event"]

Embedding model: text-embedding-3-large (configurable via agent_settings.openai_embedding_model).

Dimensions: 3072. Stored as vector(3072).

Re-embed only when content_hash changes.

5. Indexing triggers

Initial: on project ready state after clone.

Incremental: on file_created, file_updated, file_deleted events from a run.

Manual: POST /api/projects/{project}/index-rag re-scans and re-indexes everything.

Scheduled: nightly Horizon job ProjectIndexer::scan() for drift detection.

6. Retrieval contract

RagContextService::retrieve(Project $p, string $query, array $options = [])

Options

top_k (default 12)

source_types (filter list)

must_include_paths (boost)

route_name, controller_class, table_name (relationship-aware filters)

include_run_summaries (default true)

min_score (default 0.25)

Returns RagResult { chunks: RagChunk[], debug: { strategy, latency_ms, total_candidates, scores } }.

7. Relationship-aware retrieval

The service must support multi-hop queries the agent commonly needs:

"What controller serves route invoices.store and which model and table does it touch?" → resolved by graph joins in metadata_json plus a route → controller_method → model → table walk before vector search.

"Which migrations created or altered table invoices?" → SQL filter on source_type='migration' + metadata affected_tables.

"What did previous runs change about InvoiceController?" → source_type='file_change' filter on path.

8. Debug & observability

Every retrieval emits a rag_search_started and rag_search_completed event.

payload_json includes the query, top-k IDs, scores, and which chunks were chosen for the prompt.

Console UI shows expandable chunks (path + line range + snippet).

9. Per-run RAG toggle

Each run has metadata_json.rag_enabled (default true).

When false, the orchestrator skips retrieval and emits an explicit ai_context_loaded event with { rag: false }.

The agent's system prompt acknowledges "RAG is OFF for this run" so it does not hallucinate references.

10. Quality safeguards

Stale detection: nightly job re-hashes every source file and invalidates chunks whose source is gone.

De-duplication: identical content_hash across files is collapsed; metadata lists all paths.

Tokenization budget: top-k chunks trimmed to fit ~40 % of the model's context window. Remaining budget reserved for the user prompt and tool output.

No blind guessing: if chunks.length === 0 and confidence is below threshold, the agent must ask the user a clarifying question via paused_for_input rather than fabricate.

11. Service skeleton

final class RagContextService {
	public function __construct(
		private EmbeddingClient $embeddings,
		private ChunkRepository $chunks,
		private ConsoleEventService $events,
	) {}

	public function retrieve(Project $project, string $query, array $opts = []): RagResult { /* … */ }
	public function indexFile(Project $project, string $relativePath): void { /* … */ }
	public function reindexProject(Project $project): void { /* … */ }
	public function invalidate(Project $project, string $sourcePath): void { /* … */ }
}