Raspberry Pi AI — ARM64 Edge AI Platform

Full AI Platform on a Pi

Guaardvark was built from the ground up with resource efficiency as a first-class concern, and nowhere is that more apparent than on Raspberry Pi hardware. The entire platform — chat interface, document management, RAG search engine, agent framework, and voice pipeline — runs on a Raspberry Pi 5 with 8GB of RAM. This is not a toy demo or a stripped-down subset of the real product. It is the same Guaardvark that runs on workstations and servers, adapted automatically to the resources available on the device.

At the heart of the Raspberry Pi deployment is Ollama, which provides local language model inference optimized for CPU-only environments. Guaardvark's hardware detection system identifies the ARM64 architecture at startup and configures itself accordingly — selecting lightweight embedding models, adjusting context window sizes, tuning batch processing parameters, and throttling background tasks to stay within the memory budget. The result is a responsive AI platform that runs comfortably on a $60 single-board computer without swap thrashing or out-of-memory crashes.

What makes this practical rather than merely possible is model selection. Small language models like Phi, TinyLlama, Gemma 2B, and Qwen 1.5B deliver surprisingly capable inference on ARM64 CPUs. They handle conversational AI, document summarization, question answering, and agent reasoning with response times that are slower than a GPU workstation but entirely usable for interactive work. For many edge deployment scenarios — a workshop assistant, a field research station, a classroom learning tool — the tradeoff between cost and capability tips decisively in favor of the Pi.

Every feature that defines Guaardvark is available on Raspberry Pi: AI chat with model switching, RAG search with BM25 and vector retrieval over your uploaded documents, AI agents running the ReACT reasoning loop with the full tool registry, and document management with tagging, collections, and metadata extraction. You can upload PDFs, ask questions about their contents, and get grounded answers sourced from your own files — all without any network connection.

Supported Hardware

Guaardvark targets a broad range of edge and small-form-factor hardware. The platform is architecture-aware and adapts its configuration based on the CPU, available memory, and presence or absence of a GPU. While the Raspberry Pi 5 is the primary reference device for ARM64 edge deployments, Guaardvark runs on any Linux system that meets the minimum requirements.

Raspberry Pi 5 (8GB RAM) — The recommended edge device. Provides enough memory for Ollama with small models, the full Guaardvark stack, PostgreSQL, and Redis running concurrently. The quad-core Cortex-A76 at 2.4GHz delivers meaningful inference throughput for conversational and RAG workloads.
Raspberry Pi 4 (4GB+ RAM) — Supported with a reduced feature set. Chat and RAG search work well with the smallest models. Voice chat and agent workflows are functional but constrained by the slower Cortex-A72 cores and tighter memory budget. The 8GB variant of the Pi 4 provides a better experience.
ARM64 single-board computers — Orange Pi 5, Odroid N2+, Rock 5B, and similar boards with 4GB or more of RAM are supported. Any device running a 64-bit ARM Linux distribution with Docker support can host Guaardvark.
x86 mini PCs and NUCs — Intel NUCs, Beelink mini PCs, and similar compact x86_64 devices provide higher single-thread performance and often ship with 16GB or 32GB of RAM, enabling larger models and faster inference while maintaining a small physical footprint suitable for edge deployment.
NVIDIA Jetson — Jetson Nano, Orin Nano, and Orin NX boards bring GPU-accelerated inference to the edge. Guaardvark detects the Jetson GPU at startup and configures Ollama to use CUDA, unlocking dramatically faster inference with larger models than CPU-only devices can handle.

What Works on Pi

Every core capability of the Guaardvark platform functions on Raspberry Pi hardware. The platform automatically adjusts model sizes, context windows, and concurrency limits to match the available resources. Here are the four primary capabilities and how they perform on edge devices:

AI Chat

Full conversational AI powered by Ollama with small models like Phi, TinyLlama, and Gemma 2B. Supports model switching at runtime, conversation history, and system prompts. Response latency is higher than GPU systems but comfortable for interactive use.

RAG Search

Document indexing with BM25 keyword search and vector retrieval using ARM-optimized embedding models. Upload PDFs, text files, and documents, then ask natural language questions answered with citations from your own content.

AI Agents

Autonomous agents running the ReACT reasoning loop with access to the full tool registry. Agents reason about tasks, select tools, execute actions, and iterate — performing multi-step research, file processing, and content generation on the Pi.

Voice Chat

Speech-to-text powered by Whisper.cpp, which runs efficiently on ARM64 with NEON SIMD optimizations. Speak to your AI and receive spoken responses via Piper TTS. The entire voice pipeline operates locally with no cloud transcription services.

Resource Optimization

Running a full AI platform on a device with 8GB of RAM requires deliberate resource management at every layer of the stack. Guaardvark includes a hardware detection and auto-configuration system that profiles the host device at startup and adjusts dozens of parameters to match the available resources. This is not a single “low memory mode” toggle — it is a graduated system that makes fine-grained decisions based on the actual CPU architecture, core count, clock speed, and memory capacity it detects.

Automatic hardware detection runs at container startup and identifies the CPU architecture (ARM64 vs. x86_64), available RAM, number of cores, and whether a GPU is present. Based on this profile, the system selects appropriate defaults for model sizes, embedding dimensions, batch sizes, and concurrency limits. On a Raspberry Pi 5 with 8GB of RAM, the system knows it has roughly 4–5GB available after the OS and background services, and it configures the entire stack to operate within that budget.

Embedding model auto-selection is a critical optimization for ARM devices. Full-sized embedding models like those used on GPU workstations would consume too much memory and run too slowly on ARM CPUs. Guaardvark automatically selects compact, ARM-efficient embedding models that produce high-quality vectors for RAG retrieval while keeping memory usage and indexing time within practical bounds.

Memory-conscious design permeates the entire codebase. The platform avoids unnecessary caching layers, pre-allocated buffers, and memory-hungry data structures that would be harmless on a 64GB workstation but fatal on a Pi. Document processing pipelines stream data rather than loading entire files into memory. Search indices are built incrementally. Model loading is lazy — models are pulled into memory only when a request arrives and can be unloaded when idle.

Redis and PostgreSQL tuning for limited RAM environments reduces the default memory footprint of both services. Redis maxmemory is capped, eviction policies are configured for LRU, and PostgreSQL shared buffers and work memory are scaled down to leave headroom for Ollama inference. These are not afterthought tweaks — they are part of the standard configuration profile applied automatically when the hardware detection system identifies a constrained device.

Background task throttling prevents resource exhaustion during document indexing, embedding generation, and other batch operations. On a workstation, these tasks can run at full parallelism. On a Pi, the system limits concurrent operations, inserts cooling pauses between CPU-intensive steps, and monitors memory pressure to back off before the system starts swapping. The result is a platform that stays responsive even while processing documents in the background.

Edge Deployment Scenarios

The combination of low cost, small size, low power consumption, and full AI capability makes Raspberry Pi deployments uniquely suited to scenarios where traditional server infrastructure is impractical, too expensive, or simply unavailable. Guaardvark on a Pi is not a compromise — it is a purpose-built solution for environments that demand local intelligence without the overhead of rack-mounted hardware.

Workshop or Lab AI Assistant

Place a Raspberry Pi running Guaardvark on a workbench or lab bench and give your team an always-available AI assistant. Upload equipment manuals, safety procedures, and reference documents. Staff can ask questions by voice or keyboard and get answers sourced directly from your documentation — no internet connection needed, no risk of hallucinated safety information from a general-purpose cloud model.

Field Research Stations

Remote field stations often lack reliable internet connectivity. A Pi running Guaardvark gives researchers local AI capabilities for analyzing field notes, searching reference materials, and generating reports. The device draws minimal power and can run from a solar panel or battery pack, making it suitable for deployment in locations where infrastructure is limited or nonexistent.

Classroom AI Learning Environment

Schools and universities can deploy Guaardvark on Raspberry Pis to give students hands-on experience with AI systems without the cost of cloud API subscriptions or the privacy concerns of sending student data to external services. Each student or lab group gets their own AI platform to experiment with — chatting with models, building RAG pipelines, and exploring agent workflows in a safe, contained environment.

Home Automation AI Hub

A Pi running Guaardvark can serve as an intelligent hub for home automation, processing voice commands, answering questions about household documents (manuals, warranties, recipes), and running agent workflows triggered by events from other smart home devices. All processing stays local — your conversations and documents never leave your home network.

Distributed Pi Network via Interconnector

For deployments that exceed the capacity of a single device, Guaardvark's Interconnector feature links multiple Pis into a coordinated network. Each Pi operates independently as a full AI platform, but the Interconnector enables model sharing, document synchronization, and federated search across the fleet. A network of five Pis, each indexing a different document collection, can provide RAG search across the combined corpus while keeping each device's resource usage within comfortable limits.

Building a Pi AI Cluster

A single Raspberry Pi running Guaardvark is a capable edge AI device. A cluster of Pis connected via the Interconnector is something more — a distributed AI infrastructure that can scale document coverage, balance workload, and provide redundancy, all built from commodity hardware costing a fraction of traditional server deployments.

Connect multiple Pis using the Interconnector. The Interconnector is Guaardvark's built-in multi-machine synchronization layer. Each Pi registers itself on the local network and establishes encrypted peer-to-peer connections with other Guaardvark instances. Configuration is minimal — point each instance at the network and they discover each other automatically.

Share models and sync documents across devices. When one Pi in the cluster downloads a new Ollama model, the Interconnector can distribute it to other nodes, eliminating redundant downloads. Documents uploaded to one Pi can be synchronized to others, ensuring that the entire cluster has access to the same knowledge base. Synchronization is incremental and bandwidth-efficient, designed for the limited throughput of edge network connections.

Distributed RAG across the cluster. Rather than replicating every document on every device, a Pi cluster can divide the document corpus across nodes. Each Pi indexes and embeds its assigned subset of documents. When a user queries any node, the Interconnector routes the search across all nodes in the cluster and merges the results. This distributed RAG pattern allows the cluster to handle document collections far larger than any single Pi could manage, while keeping each device's memory and storage usage manageable.

Central dashboard monitors the fleet. Guaardvark provides a monitoring view that displays the status of every connected node — CPU utilization, memory pressure, model availability, document counts, and inference queue depth. Administrators can see at a glance which nodes are healthy, which are under load, and which need attention. For deployments with dozens of Pis spread across a facility, this visibility is essential for maintaining reliable service.

Raspberry Pi & Edge AI

Full AI Platform on a Pi

Supported Hardware

What Works on Pi

AI Chat

RAG Search

AI Agents

Voice Chat

Resource Optimization

Edge Deployment Scenarios

Workshop or Lab AI Assistant

Field Research Stations

Classroom AI Learning Environment

Home Automation AI Hub

Distributed Pi Network via Interconnector

Building a Pi AI Cluster

Deploy AI at the Edge

Full AI Platform on a Pi

Supported Hardware

What Works on Pi

AI Chat

RAG Search

AI Agents

Voice Chat

Resource Optimization

Edge Deployment Scenarios

Workshop or Lab AI Assistant

Field Research Stations

Classroom AI Learning Environment

Home Automation AI Hub

Distributed Pi Network via Interconnector

Building a Pi AI Cluster

Deploy AI at the Edge

Explore More