Local Browser Agent

Name: browser-agent
Rating: 5 (19 reviews)

Autonomous browser agent running entirely on Apple Silicon. No cloud APIs, no Claude Code overhead, no MCP layer. Direct MLX inference + Chrome DevTools Protocol.

Architecture

User prompt → Local LLM (MLX) → Chrome DevTools Protocol → Brave Browser
                   ↑                       ↓
             ~2–5s per step        DOM.pierce + DOM.focus
                                   + Input.insertText

Default model: Gemma 4 31B Instruct abliterated (4-bit quantized) via MLX on Apple Silicon Alternative models: any MLX-compatible model — Qwen 3.5 122B (biggest), Llama 3.3 70B (smartest), or anything else — swap via the MLX_MODEL env var Browser: Brave with remote debugging on port 9222 Protocol: CDP WebSocket — no MCP, no proxy, direct connection

Key Innovation: Cross-Origin Iframe + Shadow DOM Commenting

Most news sites (Yahoo, etc.) use third-party comment widgets (OpenWeb/SpotIM) that load inside:

A cross-origin iframe (JavaScript can't access it)
A Shadow DOM (normal querySelector can't find elements)
A ProseMirror rich text editor (innerHTML doesn't work)

Standard browser automation tools (Playwright, Selenium, MCP) fail at all three layers.

Our solution uses CDP primitives that bypass all of these:

DOM.getDocument(depth: -1, pierce: true)    # Exposes everything across iframes + Shadow DOM
DOM.performSearch(".ProseMirror")            # Finds the editor in any context
DOM.focus(nodeId)                            # Focuses it regardless of origin
Input.insertText(text)                       # Types into the focused element

This works because CDP operates at the browser level, not the page level. Same-origin policy doesn't apply.

Setup

Prerequisites

macOS with Apple Silicon (M-series), 32 GB+ unified memory recommended
Brave Browser (or Chrome) with remote debugging
Python 3.12+ with MLX

Install

# MLX server backend (handles local inference)
pip install mlx mlx-lm websockets

MLX Server

The agent talks to a local MLX inference server that speaks Anthropic's Messages API. The server ships with the companion repo claude-code-local — set that up first. Once installed, the server lives at ~/.local/mlx-native-server/server.py and is auto-started by the desktop launcher.

Launcher

Desktop launcher: double-click Gemma 4 Browser.command (from the claude-code-local repo's launchers/Browser Agent.command). The launcher will:

Start the MLX server with Gemma 4 31B if it isn't already running
Start Brave with --remote-debugging-port=9222 if it isn't already running
Ensure at least one page tab exists
Hand off to the Python agent

Usage

Interactive Mode (recommended)

python agent.py
# Prompts: "What should I do?"
# Type tasks, get results, stays open for the next task
# Type "quit" to exit
# Errors in one task no longer kill the whole session — you'll just get a
# message and a fresh prompt

One-Shot Mode

python agent.py "Find an article about Iran on Yahoo and make a comment"

Swap Models

# Override the default model with any MLX-compatible LLM
MLX_MODEL="mlx-community/Qwen2.5-72B-Instruct-4bit" python agent.py

Example Tasks

Comment on a news article

Find an article about Iran on Yahoo and make a comment. Don't post it, just leave it in draft.

The agent will:

Navigate to Yahoo News
Find an Iran article via JavaScript (instant, no model needed)
Click the article
Read the article content (first 6 paragraphs)
Generate a relevant 2–3 sentence comment using the model
Open the Comments section
Find the comment widget (cross-origin iframe + Shadow DOM)
Type the comment via DOM.pierce + DOM.focus + Input.insertText
Scroll so you can see the comment
NOT click Send — leaves it for your review

With specific comment text

Go to Yahoo, find an Iran article. comment: The diplomatic situation demands more transparency from all parties involved.

How It Works

Fast Path (comment tasks)

When the task mentions "comment" plus a topic keyword (iran, trump, etc.):

JavaScript finds the article — no model needed, instant
Model generates the comment — reads article paragraphs, writes 2–3 sentences
CDP types the comment — pierces through iframes and Shadow DOM

General Path (other tasks)

The model controls the browser via JSON tool calls:

navigate(url) — go to a page
snapshot() — get accessibility tree with element UIDs
click(uid) — click an element
type_text(uid, text) — type into an element
scroll(direction) — scroll up/down
js(code) — run arbitrary JavaScript
done(message) — task complete

Built-in loop detection: if the same UID gets clicked more than twice in a row, the agent presses Escape (to dismiss any lightbox/overlay) and forces a fresh snapshot so the model can try a different approach.

Error Recovery

Any exception during a task (MLX timeout, CDP websocket drop, malformed model output, etc.) is caught by the main loop — you'll see the error printed and return to the prompt rather than the whole agent crashing.

Performance (Gemma 4 31B on M-series, warm disk cache)

Metric	Value
Navigate + snapshot	~4s
Article finding (JS)	<1s
Comment generation	~8s
Comment typing (pierce + type)	~3s
Total for comment task	~20–30s

Files

agent.py — The browser agent (single file, ~470 lines)
~/.local/mlx-native-server/server.py — MLX inference server with Anthropic API + tool parsing (ships with claude-code-local)
launchers/Browser Agent.command — Desktop launcher (ships with claude-code-local, surfaces as Gemma 4 Browser.command on the Desktop)

Built With

MLX — Apple's ML framework for Apple Silicon
Gemma 4 31B — instruction-tuned, abliterated and 4-bit quantized
Chrome DevTools Protocol — direct browser control via WebSocket
No cloud APIs, no subscriptions, no data leaving your machine

💬 Community

Builders running this stack hang out in the NiceDreamzApps Discord — quiet, builder-tone, no bots. Share what you're scraping, what's breaking, what local model worked for which site.

👉 discord.gg/ZdSqgAxUW

browser-agent

An AI that can actually use the web, running entirely on your Mac.

What makes browser-agent special

Real browser

Handles everything

Local brain

Scriptable

Is this for you?

Getting started in minutes

Clone and install

Open Chrome with DevTools

Tell it what to do

Ready to try browser-agent?

Need this for a law firm, healthcare org, or anywhere documents can't leave the machine?

More from Divine Tribe

The complete README

Local Browser Agent

Architecture

Key Innovation: Cross-Origin Iframe + Shadow DOM Commenting

Setup

Prerequisites

Install

MLX Server

Launcher

Usage

Interactive Mode (recommended)

One-Shot Mode

Swap Models

Example Tasks

Comment on a news article

With specific comment text

How It Works

Fast Path (comment tasks)

General Path (other tasks)

Error Recovery

Performance (Gemma 4 31B on M-series, warm disk cache)

Files

Built With

💬 Community

browser-agent

An AI that can actually use the web, running entirely on your Mac.

What makes browser-agent special

Real browser

Handles everything

Local brain

Scriptable

Is this for you?

Getting started in minutes

Clone and install

Open Chrome with DevTools

Tell it what to do

Ready to try browser-agent?

Need this for a law firm, healthcare org, or anywhere documents can't leave the machine?

More from Divine Tribe

claude-code-local

NarrateClaude

studio-record

claude-screen-to-phone

The complete README

Local Browser Agent

Architecture

Key Innovation: Cross-Origin Iframe + Shadow DOM Commenting

Setup

Prerequisites

Install

MLX Server

Launcher

Usage

Interactive Mode (recommended)

One-Shot Mode

Swap Models

Example Tasks

Comment on a news article

With specific comment text

How It Works

Fast Path (comment tasks)

General Path (other tasks)

Error Recovery

Performance (Gemma 4 31B on M-series, warm disk cache)

Files

Built With

💬 Community