RAG Search
Hybrid retrieval-augmented generation running entirely on your hardware. BM25 lexical search meets vector embeddings, optimized automatically by Guaardvark's autoresearch engine — no cloud, no API costs, no data leaving your network.
Hybrid Retrieval Architecture
Most RAG implementations rely on a single retrieval method — either keyword-based lexical search or semantic vector similarity. Each approach has blind spots. Lexical search excels at finding exact terms and phrases but misses synonyms and paraphrased concepts. Vector search captures semantic meaning but can overlook precise terminology, especially in technical or domain-specific corpora. Guaardvark's local RAG search eliminates this trade-off by combining both strategies into a unified hybrid retrieval pipeline.
BM25 Lexical Search
At the foundation of the retrieval stack is a BM25 scoring engine. BM25 is a proven probabilistic ranking function that weights term frequency, inverse document frequency, and document length normalization. When you query your local document collection, the BM25 layer rapidly identifies chunks that contain your exact search terms, weighted by statistical relevance. This is especially powerful for domain-specific vocabulary — legal case citations, medical codes, API function names — where exact matches matter.
Vector Embedding Retrieval
Running in parallel, the vector retrieval layer encodes your query into a dense embedding and performs approximate nearest-neighbor search across your document index. This captures the semantic intent behind your question, surfacing relevant passages even when the wording differs from the source material. Guaardvark supports multiple embedding models that run entirely on your local hardware, with automatic selection based on your system capabilities.
Entity Extraction and Knowledge Graph Indexing
Beyond raw text retrieval, Guaardvark performs entity extraction during document ingestion. Named entities — people, organizations, dates, technical terms, and relationships — are indexed into a lightweight knowledge graph structure. This enables relationship-aware queries where the system can traverse entity connections to surface contextually relevant documents that neither lexical nor vector search alone would return. For example, querying about a specific regulation can surface related compliance documents, enforcement actions, and referenced statutes through entity relationship traversal.
Reranking Pipeline
Raw retrieval results from BM25 and vector search are merged and passed through a reranking stage. The reranker evaluates each candidate passage against your original query using a cross-encoder model, producing a refined relevance score that accounts for nuanced contextual alignment. This dramatically improves precision in the final results presented to your LLM for answer generation. The entire pipeline — from query to reranked results to generated answer — executes locally with zero external network calls.
All retrieval components are powered by LlamaIndex running locally on your machine. There is no dependency on external search APIs, hosted vector databases, or cloud embedding services. Your private RAG pipeline stays private.
RAG Autoresearch
Configuring a RAG pipeline is notoriously finicky. Chunk size, chunk overlap, top-k retrieval count, embedding model selection, reranker thresholds — each parameter affects retrieval quality, and the optimal configuration depends entirely on the nature of your documents. Legal briefs need different chunking than Python source files. Medical research papers behave differently than internal wikis. Manual tuning is tedious and error-prone.
Guaardvark introduces RAG Autoresearch — a self-optimizing retrieval system that is unique to the platform. Instead of requiring you to manually experiment with different configurations, Autoresearch runs experiments behind the scenes on your behalf.
How Autoresearch Works
When you create a new document collection, Autoresearch begins by establishing a baseline configuration using sensible defaults. It then systematically varies parameters — testing different chunk sizes (256 tokens, 512 tokens, 1024 tokens and beyond), adjusting chunk overlap ratios, experimenting with different top-k values for retrieval, and comparing embedding model performance against your specific document corpus. Each experiment produces measurable retrieval quality metrics using a combination of relevance scoring, answer faithfulness, and context precision.
The system runs these experiments in the background during idle periods, so your interactive search experience is never impacted. As experiments complete, Autoresearch adopts the configuration that produces the highest quality results for your particular documents. This creates a continuous optimization loop — your RAG pipeline genuinely gets better over time without any manual tuning on your part.
For organizations with diverse document types — a mix of contracts, technical specifications, research papers, and correspondence — Autoresearch can maintain per-collection configurations, ensuring each document set is optimized independently. The result is a self-hosted RAG system that adapts to you, not the other way around.
Document Pipeline
From raw files to queryable knowledge, Guaardvark handles every step of the offline document search AI pipeline. Upload your documents, and the system automatically processes, chunks, embeds, and indexes them for immediate retrieval.
Document Ingestion
Supports PDF, DOCX, TXT, Markdown, and source code files out of the box. The ingestion engine extracts clean text content, preserves document structure and metadata, and handles multi-format collections seamlessly. Drop a folder of mixed file types and Guaardvark processes them all.
Chunking & Embedding
Documents are split into semantically coherent chunks using configurable strategies. Each chunk is embedded using a locally-running model that Autoresearch selects and optimizes for your hardware. Embeddings are generated on your CPU or GPU — never sent to an external service.
Index Management
Create, update, and delete document collections through the Guaardvark interface. Add new documents to existing indexes incrementally without re-processing the entire collection. Manage multiple isolated indexes for different projects, departments, or document categories.
Query & Retrieval
Ask natural language questions against your document collections. The hybrid retrieval engine finds the most relevant passages, and your local LLM generates grounded answers with source citations. Chain queries together for multi-step research workflows using AI Agents.
Auto Embedding Selection
One of the most common friction points in self-hosted RAG deployment is choosing and configuring the right embedding model. Different models have different hardware requirements, memory footprints, and performance characteristics. Selecting the wrong model can mean painfully slow ingestion on CPU, out-of-memory errors on smaller GPUs, or suboptimal retrieval quality for your document domain.
Guaardvark eliminates this problem entirely with automatic embedding model selection. During initial setup and whenever your hardware configuration changes, the system performs a detection sweep:
- Hardware Detection — Automatically identifies whether you are running on CPU or GPU, detects ARM64 or x86 architecture, measures available VRAM and system RAM, and checks for CUDA availability and version.
- Model Selection — Based on your detected hardware profile, Guaardvark selects the optimal embedding model from its supported model registry. High-VRAM GPUs get larger, higher-quality models. CPU-only systems get lightweight models optimized for throughput. ARM64 devices get models specifically tested for that architecture.
- GPU-Accelerated Embeddings — When a CUDA-compatible GPU is available, embedding generation runs on the GPU for dramatically faster document ingestion. A collection that takes hours to embed on CPU can complete in minutes with GPU acceleration.
- Zero Configuration — No manual model downloads, no YAML configuration files, no trial-and-error with model compatibility. The system handles everything automatically so you can focus on your documents, not your infrastructure.
This auto-selection works in concert with Autoresearch. If Autoresearch determines that a different embedding model produces better retrieval results for your specific documents and your hardware can support it, the system can re-embed your collection with the improved model during background optimization cycles.
Why Self-Hosted RAG
Cloud-based RAG services are convenient, but they introduce risks and costs that are unacceptable for many organizations and use cases. Guaardvark's self-hosted RAG pipeline addresses these concerns directly.
Your Documents Stay on Your Hardware
Every document you ingest, every embedding generated, every query you run — all of it stays on your local machine or network. For legal firms handling privileged client communications, healthcare organizations processing patient records, financial institutions managing sensitive transactions, or government agencies operating in classified environments, this is not a preference — it is a requirement. Guaardvark's private RAG pipeline ensures complete data sovereignty with zero external data transmission.
No Per-Query API Costs
Cloud RAG services charge per embedding, per query, per token of retrieval context, and per generated response. For organizations running hundreds or thousands of searches per day against large document collections, these costs compound rapidly. With Guaardvark, you run unlimited searches against unlimited documents at zero marginal cost. Your only expense is the hardware you already own.
Domain-Specific Customization
Different document types demand different retrieval strategies. Legal documents benefit from larger chunk sizes that preserve full clause context. Source code repositories need syntax-aware chunking that respects function and class boundaries. Medical literature requires entity extraction tuned for clinical terminology. Guaardvark's Autoresearch system automatically discovers these optimal configurations, and you retain full control to override and customize any parameter for your specific domain.
AI Agent Integration
RAG search becomes even more powerful when combined with Guaardvark's AI Agents. Agents can autonomously execute multi-step research workflows — querying your document collections, synthesizing findings across multiple sources, cross-referencing results, and producing structured reports. This transforms static document search into an automated research pipeline that runs entirely within your infrastructure.