Want to run your own local LLM, expose it to the internet, and build a slick UI for it?
This post shows you how to:
- Run Ollama locally on a Raspberry Pi 5
- Expose it publicly using Ngrok
- Build a Next.js app with streaming chat UI using the AI SDK
By the end, you'll have a fully functional AI chat interface connected to your own local model server.
🎥 Full Walkthrough (Video)
⚙️ What You Need
- Raspberry Pi 5 (8GB RAM recommended) or any Linux machine
- SSH access to your Pi
- Node.js 18+ and npm
- Ngrok account (free tier works fine)
- Basic knowledge of Next.js and React
Part 1: Setting Up Ollama
1. Install Ollama on the Pi
SSH into your Raspberry Pi and install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
2. Start Ollama (bind to all IPs)
By default, Ollama only binds to localhost. We need it accessible from other machines:
OLLAMA_HOST=0.0.0.0 ollama serve
3. Pull a Model
Download a lightweight model that runs well on Pi:
# Open a new terminal session
ollama pull llama3.2:1b # Smaller model for Pi
# or
ollama pull llama3.2:3b # If you have 8GB+ RAM
4. Test Ollama Locally
Verify everything works:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:1b",
"prompt": "Hello, how are you?",
"stream": false
}'
You should see a JSON response with the AI's reply.
Part 2: Expose with Ngrok
5. Install and Configure Ngrok
Install Ngrok:
# Add Ngrok's official GPG key
curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null
# Add Ngrok repository
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list
# Install
sudo apt update && sudo apt install ngrok
Authenticate and Tunnel:
# Add your auth token (get it from ngrok.com/dashboard)
ngrok config add-authtoken YOUR_AUTH_TOKEN
# Create the tunnel
ngrok http 11434
Important: Copy the HTTPS URL (e.g., https://abc123.ngrok-free.app
). You'll need this for your frontend.
Part 3: Build the Next.js Frontend
6. Create a Next.js App
npx create-next-app@latest ai-chat-app
cd ai-chat-app
Choose these options:
- ✅ TypeScript
- ✅ ESLint
- ✅ Tailwind CSS
- ✅ App Router
- ✅ Turbopack
7. Install AI SDK
The AI SDK by Vercel makes streaming responses super easy:
npm install ai
8. Create the Streaming API Route
Create app/api/generate/route.ts
:
export const runtime = "nodejs";
export const maxDuration = 30;
interface Message {
role: "user" | "assistant";
content: string;
}
export async function POST(req: Request) {
const { messages } = await req.json();
// Convert chat messages to a single prompt for Ollama
const prompt =
messages
.map(
(msg: Message) =>
`${msg.role === "user" ? "Human" : "Assistant"}: ${msg.content}`
)
.join("\n") + "\nAssistant:";
// Replace with your actual Ngrok URL from https://dashboard.ngrok.com/
const response = await fetch(
"https://YOUR-NGROK-URL.ngrok-free.app/api/generate",
{
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "llama3.2:1b", // or your chosen model
prompt: prompt,
stream: true,
}),
}
);
if (!response.ok) {
throw new Error(
`Ollama API error: ${response.status} ${response.statusText}`
);
}
// Create a streaming response compatible with AI SDK
const stream = new ReadableStream({
async start(controller) {
const reader = response.body?.getReader();
const decoder = new TextDecoder();
let buffer = "";
if (!reader) return;
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (line.trim()) {
try {
const data = JSON.parse(line);
if (data.response) {
// Format for AI SDK's useChat hook
const chunk = `0:"${data.response.replace(/"/g, '\\"')}"\n`;
controller.enqueue(new TextEncoder().encode(chunk));
}
if (data.done) {
controller.close();
return;
}
} catch (e) {
console.error("Parse error:", line, e);
}
}
}
}
} catch (error) {
controller.error(error);
} finally {
reader.releaseLock();
}
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Transfer-Encoding": "chunked",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
}
9. Build the Chat Interface
Replace app/page.tsx
with:
"use client";
import { useChat } from "ai/react";
export default function Home() {
const { messages, input, handleInputChange, handleSubmit, isLoading } =
useChat({
api: "/api/generate",
});
return (
<main className="flex flex-col items-center justify-center min-h-screen p-4 bg-gray-50 dark:bg-black text-gray-900 dark:text-gray-100">
<div className="w-full max-w-2xl flex flex-col h-[80vh]">
<h1 className="text-3xl sm:text-4xl font-bold mb-8 text-center">
🤖 Local AI Chat
</h1>
{/* Messages Container */}
<div className="flex-1 overflow-y-auto mb-4 space-y-4">
{messages.length === 0 && (
<div className="text-center text-gray-500 mt-20">
<p className="text-lg mb-2">👋 Welcome to your local AI!</p>
<p>Start chatting with your Raspberry Pi-powered assistant</p>
</div>
)}
{messages.map((message, i) => (
<div
key={i}
className={`p-4 rounded-lg ${
message.role= "user"
? "bg-blue-100 dark:bg-blue-900 ml-auto"
: "bg-gray-100 dark:bg-gray-800"
} max-w-[80%]`}
>
<div className="font-semibold mb-1 text-sm opacity-70">
{message.role === "user" ? "You" : "🤖 AI"}
</div>
<div className="whitespace-pre-wrap">{message.content}</div>
</div>
))}
{isLoading && (
<div className="bg-gray-100 dark:bg-gray-800 p-4 rounded-lg max-w-[80%]">
<div className="font-semibold mb-1 text-sm opacity-70">🤖 AI</div>
<div className="flex items-center">
<div className="typing-dots">
<span></span>
<span></span>
<span></span>
</div>
</div>
</div>
)}
</div>
{/* Input Form */}
<form onSubmit={handleSubmit} className="flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder="Ask your local AI anything..."
className="flex-1 p-3 border border-gray-300 dark:border-gray-700 rounded-lg bg-white dark:bg-gray-800 focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isLoading}
/>
<button
type="submit"
disabled={isLoading || !input.trim()}
className="bg-blue-600 hover:bg-blue-700 disabled:bg-gray-400 text-white font-semibold py-3 px-6 rounded-lg shadow transition-colors"
>
{isLoading ? "..." : "Send"}
</button>
</form>
</div>
</main>
);
}
10. Add Loading Animation (Optional)
Add this CSS to app/globals.css
:
.typing-dots {
display: flex;
gap: 4px;
}
.typing-dots span {
width: 6px;
height: 6px;
border-radius: 50%;
background-color: #6b7280;
animation: typing 1.4s infinite;
}
.typing-dots span:nth-child(1) {
animation-delay: 0s;
}
.typing-dots span:nth-child(2) {
animation-delay: 0.2s;
}
.typing-dots span:nth-child(3) {
animation-delay: 0.4s;
}
@keyframes typing {
0%,
60%,
100% {
transform: translateY(0);
}
30% {
transform: translateY(-10px);
}
}
11. Run Your App
npm run dev
Visit http://localhost:3000
and start chatting with your local AI!
🔥 Test It Out
Try these prompts:
- "Explain quantum computing in simple terms"
- "Write a Python function to sort a list"
- "What's the weather like on Mars?"
Your responses will stream in real-time from your Raspberry Pi! 🚀
✅ What You Built
You now have:
- ✅ Local AI server running on your Pi
- ✅ Public access via Ngrok tunnel
- ✅ Beautiful chat interface with streaming responses
- ✅ Real-time AI conversations powered by your own hardware
🧠 Next Steps & Improvements
Performance:
- Use a more powerful model like
llama3.2:3b
orllama3.2:8b
- Add GPU acceleration if you have a compatible setup
- Implement response caching for common queries
Production Ready:
- Replace Ngrok with your own domain + reverse proxy (nginx)
- Add authentication and rate limiting
- Set up SSL certificates
- Deploy to a VPS or cloud instance
Features:
- Add conversation history/memory
- Implement different AI personas/modes
- Add file upload and document Q&A
- Create mobile-responsive design improvements
Models to Try:
codellama
- For coding assistancegemma:2b
- Google's efficient modelphi3:mini
- Microsoft's compact model- Custom fine-tuned models
This setup gives you complete control over your AI infrastructure while keeping costs minimal. Perfect for learning, experimenting, or building your own AI-powered applications!