I resisted RAG for a while because it felt like an excuse to build a small database and call it “AI.” Then I looked at my own mess: README files, tickets, config snippets, and that one markdown doc called “NOTES-final-final-2.md.”
In my lab, the real promise of RAG is not magic new intelligence. It is less context switching. If I can ask “what did I do last time” and get the right file and the right paragraph, that is a win.
The punchline: retrieval is an ops problem disguised as an AI problem. The model is usually not the hard part. The hard part is ingest, chunking, permissions, and getting the system to fail in a way that does not lie to you.
what I tried to index (and what I avoided)
I started small. In my lab I indexed:
- my homelab docs (markdown),
- service READMEs and deploy notes,
- snippets I actually reuse (bash, systemd unit templates),
- a handful of exported web pages that I reference often.
I avoided indexing:
- huge binary dumps,
- random chat logs,
- anything where I could not define what “right answer” even means.
chunking: boring decisions with big consequences
Chunking is where RAG projects go to die. Too small and you lose context. Too big and retrieval gets noisy and expensive. In my lab, the best heuristic was: chunk by structure when possible. Markdown headings, code fences, and paragraphs are your friends.
I also learned to keep some metadata: file path, last modified time (roughly), and a short type label. Otherwise every retrieved chunk looks equally plausible, which is how you get confident nonsense.
example: a minimal “index this directory” script
This is a simplified sketch of my ingestion loop. The point is not the embedding math. The point is consistent input and consistent metadata.
# example: naive ingestion loop (pseudo-code shape)
# assumes an OpenAI-compatible embeddings endpoint
set -euo pipefail
ROOT="./notes"
EMBED_URL="http://127.0.0.1:8081/v1/embeddings"
MODEL="Qwen3-Embedding-0.6B"
find "$ROOT" -type f -name '*.md' -print0 | while IFS= read -r -d '' f; do
text=$(sed 's/[[:cntrl:]]//g' "$f" | head -c 12000)
curl -sS "$EMBED_URL" \
-H 'Content-Type: application/json' \
-d @- <<JSON >/dev/null
{"model":"$MODEL","input":$(python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))' < <(printf "%s" "$text"))}
JSON
printf "indexed %s\n" "$f"
done
This is not production-grade. It is just enough to build muscle memory. Once I had a working loop, I replaced the messy bits with something less embarrassing.
retrieval quality: I stopped chasing top-k perfection
The first thing I did was tweak top-k, similarity thresholds, and chunk sizes like I was tuning a race car. That was a mistake. In my lab, the easiest quality gain was not tuning, it was curation: removing junk files, renaming things, and adding a bit of structure.
Retrieval is only as good as the data you feed it. If your notes are “TODO: fix later” repeated 200 times, that is what you will retrieve.
the biggest guardrail: make the model cite sources
I do not trust an assistant that answers without showing me where it got the answer. In my lab, the rule is: include the file path and a short quote for each claim. If the system cannot retrieve anything relevant, it should say so.
example: a prompt template for grounded answers
# example: system prompt snippet for RAG answers
You are neo’s lab assistant.
Rules:
- Only answer using the provided context chunks.
- For each key point, cite (path) and include a short quoted line.
- If context is missing or conflicting, say what is missing and ask a follow-up question.
- Do not invent commands or file locations.
This does not eliminate hallucination, but it makes it louder. If the model tries to hand-wave, it has to do it in public.
keeping it fresh: indexing as a routine
The most embarrassing RAG failure in my lab was not hallucination. It was staleness. I edited a note, forgot to re-index, then asked the assistant why it was “ignoring” the update. The assistant was fine. My pipeline was lying.
The fix was boring: make indexing a routine. For small note sets, I can re-index on change. For larger sets, I re-index on a schedule and I track a “last indexed” timestamp. If the index is older than the file, I treat retrieval as suspect.
a tiny evaluation set beats endless tuning
I keep a small list of questions I actually ask: “where is the systemd unit for X,” “how did I configure backup rotation,” “which port did I bind that service to.” When I change chunking or embeddings, I run those questions again. It is not scientific, but it is consistent, and consistency is how you catch regressions.
what worked / what broke
what worked
- Indexing small, high-value docs first instead of trying to swallow my entire filesystem.
- Metadata (path + type) made debugging retrieval much easier.
- Citations turned “maybe correct” answers into something I can verify quickly.
what broke
- Garbage in, garbage retrieved: messy notes created messy answers.
- Over-chunking split steps across chunks, so the model missed critical context.
- Assuming freshness: I forgot to re-index after edits and blamed the model.
where I landed
In my lab, RAG is useful when it is humble. It is not a replacement for documentation. It is a navigation layer. The real win is that I find the right note faster, not that a model says pretty words.