Build Your Own AI Server with Ollama, Ngrok & Next.js — on a Raspberry Pi

Want to run your own local LLM, expose it to the internet, and build a slick UI for it?

This post shows you how to:

Run Ollama locally on a Raspberry Pi 5
Expose it publicly using Ngrok
Build a Next.js app with streaming chat UI using the AI SDK

By the end, you'll have a fully functional AI chat interface connected to your own local model server.

🎥 Full Walkthrough (Video)

📺 Watch the YouTube Video

⚙️ What You Need

Raspberry Pi 5 (8GB RAM recommended) or any Linux machine
SSH access to your Pi
Node.js 18+ and npm
Ngrok account (free tier works fine)
Basic knowledge of Next.js and React

Part 1: Setting Up Ollama

1. Install Ollama on the Pi

SSH into your Raspberry Pi and install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

2. Start Ollama (bind to all IPs)

By default, Ollama only binds to localhost. We need it accessible from other machines:

OLLAMA_HOST=0.0.0.0 ollama serve

3. Pull a Model

Download a lightweight model that runs well on Pi:

# Open a new terminal session
ollama pull llama3.2:1b  # Smaller model for Pi
# or
ollama pull llama3.2:3b  # If you have 8GB+ RAM

4. Test Ollama Locally

Verify everything works:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "Hello, how are you?",
  "stream": false
}'

You should see a JSON response with the AI's reply.

Part 2: Expose with Ngrok

5. Install and Configure Ngrok

Install Ngrok:

# Add Ngrok's official GPG key
curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null

# Add Ngrok repository
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list

# Install
sudo apt update && sudo apt install ngrok

Authenticate and Tunnel:

# Add your auth token (get it from ngrok.com/dashboard)
ngrok config add-authtoken YOUR_AUTH_TOKEN

# Create the tunnel
ngrok http 11434

Important: Copy the HTTPS URL (e.g., https://abc123.ngrok-free.app). You'll need this for your frontend.

Part 3: Build the Next.js Frontend

6. Create a Next.js App

npx create-next-app@latest ai-chat-app
cd ai-chat-app

Choose these options:

✅ TypeScript
✅ ESLint
✅ Tailwind CSS
✅ App Router
✅ Turbopack

7. Install AI SDK

The AI SDK by Vercel makes streaming responses super easy:

npm install ai

8. Create the Streaming API Route

Create app/api/generate/route.ts:

export const runtime = "nodejs";
export const maxDuration = 30;

interface Message {
  role: "user" | "assistant";
  content: string;
}

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Convert chat messages to a single prompt for Ollama
  const prompt =
    messages
      .map(
        (msg: Message) =>
          `${msg.role === "user" ? "Human" : "Assistant"}: ${msg.content}`
      )
      .join("\n") + "\nAssistant:";

  // Replace with your actual Ngrok URL from https://dashboard.ngrok.com/
  const response = await fetch(
    "https://YOUR-NGROK-URL.ngrok-free.app/api/generate",
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: "llama3.2:1b", // or your chosen model
        prompt: prompt,
        stream: true,
      }),
    }
  );

  if (!response.ok) {
    throw new Error(
      `Ollama API error: ${response.status} ${response.statusText}`
    );
  }

  // Create a streaming response compatible with AI SDK
  const stream = new ReadableStream({
    async start(controller) {
      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let buffer = "";

      if (!reader) return;

      try {
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;

          buffer += decoder.decode(value, { stream: true });
          const lines = buffer.split("\n");
          buffer = lines.pop() || "";

          for (const line of lines) {
            if (line.trim()) {
              try {
                const data = JSON.parse(line);
                if (data.response) {
                  // Format for AI SDK's useChat hook
                  const chunk = `0:"${data.response.replace(/"/g, '\\"')}"\n`;
                  controller.enqueue(new TextEncoder().encode(chunk));
                }
                if (data.done) {
                  controller.close();
                  return;
                }
              } catch (e) {
                console.error("Parse error:", line, e);
              }
            }
          }
        }
      } catch (error) {
        controller.error(error);
      } finally {
        reader.releaseLock();
      }
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Transfer-Encoding": "chunked",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

9. Build the Chat Interface

Replace app/page.tsx with:

"use client";

import { useChat } from "ai/react";

export default function Home() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: "/api/generate",
    });

  return (
    <main className="flex flex-col items-center justify-center min-h-screen p-4 bg-gray-50 dark:bg-black text-gray-900 dark:text-gray-100">
      <div className="w-full max-w-2xl flex flex-col h-[80vh]">
        <h1 className="text-3xl sm:text-4xl font-bold mb-8 text-center">
          🤖 Local AI Chat
        </h1>

        {/* Messages Container */}
        <div className="flex-1 overflow-y-auto mb-4 space-y-4">
          {messages.length === 0 && (
            <div className="text-center text-gray-500 mt-20">
              <p className="text-lg mb-2">👋 Welcome to your local AI!</p>
              <p>Start chatting with your Raspberry Pi-powered assistant</p>
            </div>
          )}

          {messages.map((message, i) => (
            <div
              key={i}
              className={`p-4 rounded-lg ${
                message.role= "user"
                  ? "bg-blue-100 dark:bg-blue-900 ml-auto"
                  : "bg-gray-100 dark:bg-gray-800"
              } max-w-[80%]`}
            >
              <div className="font-semibold mb-1 text-sm opacity-70">
                {message.role === "user" ? "You" : "🤖 AI"}
              </div>
              <div className="whitespace-pre-wrap">{message.content}</div>
            </div>
          ))}

          {isLoading && (
            <div className="bg-gray-100 dark:bg-gray-800 p-4 rounded-lg max-w-[80%]">
              <div className="font-semibold mb-1 text-sm opacity-70">🤖 AI</div>
              <div className="flex items-center">
                <div className="typing-dots">
                  <span></span>
                  <span></span>
                  <span></span>
                </div>
              </div>
            </div>
          )}
        </div>

        {/* Input Form */}
        <form onSubmit={handleSubmit} className="flex gap-2">
          <input
            value={input}
            onChange={handleInputChange}
            placeholder="Ask your local AI anything..."
            className="flex-1 p-3 border border-gray-300 dark:border-gray-700 rounded-lg bg-white dark:bg-gray-800 focus:outline-none focus:ring-2 focus:ring-blue-500"
            disabled={isLoading}
          />
          <button
            type="submit"
            disabled={isLoading || !input.trim()}
            className="bg-blue-600 hover:bg-blue-700 disabled:bg-gray-400 text-white font-semibold py-3 px-6 rounded-lg shadow transition-colors"
          >
            {isLoading ? "..." : "Send"}
          </button>
        </form>
      </div>
    </main>
  );
}

10. Add Loading Animation (Optional)

Add this CSS to app/globals.css:

.typing-dots {
  display: flex;
  gap: 4px;
}

.typing-dots span {
  width: 6px;
  height: 6px;
  border-radius: 50%;
  background-color: #6b7280;
  animation: typing 1.4s infinite;
}

.typing-dots span:nth-child(1) {
  animation-delay: 0s;
}
.typing-dots span:nth-child(2) {
  animation-delay: 0.2s;
}
.typing-dots span:nth-child(3) {
  animation-delay: 0.4s;
}

@keyframes typing {
  0%,
  60%,
  100% {
    transform: translateY(0);
  }
  30% {
    transform: translateY(-10px);
  }
}

11. Run Your App

npm run dev

Visit http://localhost:3000 and start chatting with your local AI!

🔥 Test It Out

Try these prompts:

"Explain quantum computing in simple terms"
"Write a Python function to sort a list"
"What's the weather like on Mars?"

Your responses will stream in real-time from your Raspberry Pi! 🚀

✅ What You Built

You now have:

✅ Local AI server running on your Pi
✅ Public access via Ngrok tunnel
✅ Beautiful chat interface with streaming responses
✅ Real-time AI conversations powered by your own hardware

🧠 Next Steps & Improvements

Performance:

Use a more powerful model like llama3.2:3b or llama3.2:8b
Add GPU acceleration if you have a compatible setup
Implement response caching for common queries

Production Ready:

Replace Ngrok with your own domain + reverse proxy (nginx)
Add authentication and rate limiting
Set up SSL certificates
Deploy to a VPS or cloud instance

Features:

Add conversation history/memory
Implement different AI personas/modes
Add file upload and document Q&A
Create mobile-responsive design improvements

Models to Try:

codellama - For coding assistance
gemma:2b - Google's efficient model
phi3:mini - Microsoft's compact model
Custom fine-tuned models

This setup gives you complete control over your AI infrastructure while keeping costs minimal. Perfect for learning, experimenting, or building your own AI-powered applications!