Running Ollama with Requestly – A guided tutorial

Sayanta Banerjee

December 4, 2025

3MIN READ

Run LLMs fully locally using Ollama and route API calls with Requestly no backend code needed. Get private, fast, cloud-like AI responses right from localhost.

Your lightweight Client for API debugging

No Login Required

Running Large Language Models (LLMs) locally has become essential for secure, private, and low-latency AI development. Cloud LLMs are powerful, but not every project can afford to send sensitive business or user data across the Internet. With local execution, it can be solved by keeping computation fully on-device. In this guide, we show how to run an LLM fully locally with Ollama and route API traffic to that model with Requestly, without writing any backend code. This setup enables:

Real LLM responses served locally
Cloud-like API behavior
No dependency on OpenAI or hosted endpoints
Toggle between mock and real LLM responses instantly

Browser and frontend testing before backend integration We used the llama3.2:1b model because it is lightweight, fast on CPU, and ideal for developers working on laptops.

Throughout the tutorial, screenshots and terminal outputs will be inserted to validate each step of execution, so readers can confidently follow along and verify their own environment.

1. Install & Verify Ollama Locally

Ollama is a minimal runtime that exposes LLMs over a REST API running automatically on:

				
					http://localhost:11434

Install Ollama-

				
					curl -fsSL https://ollama.ai/install.sh | sh

Pull the model-

				
					ollama pull llama3.2:1b

Validate Ollama server is running

				
					ps aux | grep ollama

If you see:

				
					/usr/local/bin/ollama serve

It visually verifies:

				
					http://localhost:11434/

is responding

Test local inference via API

				
					curl http://localhost:11434/api/generate -d '{
 "model": "llama3.2:1b",
 "prompt": "hello"
}'

Expected streaming output like this:

2. Routing Local LLM Traffic via Requestly

To send LLM requests from a browser or external clients without modifying any application code, we use Requestly Redirect Rules.

This creates a cloud-like API endpoint while maintaining full local execution.

Create Redirect Rule

Open Requestly
Go to Rules from the left sidebar
Click New Rule – Redirect Request

Fill in the configuration:

Field	Value
Rule Type	Redirect Request
IF - Request URL Contains	https://user1763230645199.requestly.tech/generate
Redirect destination	http://localhost:11434/api/generate
Rule Status	Enabled

Here is the final working Redirect Rule configuration.

Also, this rule enables the full request flow:

3. Test End-to-End Using Requestly API Client

Now, verify the redirect rule works.

API Client Request

Inside Requestly – API Client:

Method: POST
URL:

				
					https://user1763230645199.requestly.tech/generate

Body (raw JSON):

				
					{
  "model": "llama3.2:1b",
  "prompt": "Hello from Requestly"
}

Click Send.

You should see model tokens streaming back via local inference.

4. Final Working Architecture

Component	Responsibility
Client (Browser / API Client / curl)	Sends prompt request
Requestly	Redirects call to local model
Ollama	Generates the LLM response
Client	Receives streamed text

Architecture Diagram

Why This Matters for Developers

Benefit	What it enables
Local-first inference	No cloud dependency & offline-ready
Zero code required	UI teams can test without waiting for the backend
Privacy / Compliance	Data stays on the device
Instant environment switching	Mock - Real model toggle
Faster iteration	No API key, no rate limiting

Conclusion

Running LLMs locally doesn’t require complex infrastructure or backend changes.
In our setup, Ollama handled the model execution entirely on the machine, while Requestly served as the routing layer, allowing us to forward API calls to the local inference endpoint without modifying any application code.

We validated the entire flow using only curl and the Requestly API Client, with a single Redirect Rule ensuring that every test request reached the llama3.2:1b model running on localhost. The integration worked smoothly end-to-end and demonstrated a practical workflow for secure, offline-ready AI development.

Any developer following these exact steps on Ubuntu can achieve the same result in one continuous run. This gives full control over where inference happens and makes it easy to adopt a local-first approach to modern LLM applications.

Written by

Sayanta Banerjee

An Electronics and Communication Engineer with a vision to advance technology with research and development

Building Robust GraphQL APIs with TypeScript

GraphQL combined with TypeScript brings clarity, safety, and power to modern API development. TypeScript ensures your GraphQL schemas and resolvers stay co…

Rashmi Saini

December 31, 2025

Mastering GraphQL Subscriptions

Learn how to define, implement, and optimize GraphQL subscriptions for real-time updates with practical tips for secure and performant APIs.

Rashmi Saini

December 31, 2025

Master GraphQL API Development with Spring Boot

GraphQL with Spring Boot offers enhanced efficiency by enabling clients to request only the data they need through a single, flexible endpoint. It simplifi…

Rashmi Saini

December 31, 2025