
Give your LLM eyes.
screenshot | llm — now a reality.
Vision|Pipe is a lightweight open source utility that captures your screen and pipes it, along with your voice, text, or visual annotations plus rich contextual metadata, directly into any LLM.
Built for developers who think in pipes.
The Loop You’re Stuck In
You’re working with an AI and need to show it what’s on your screen. You describe it in words. It misunderstands. You describe again. Repeat.
Every time you type “the button in the top right of the modal” instead of just pointing at it, you’re paying a productivity tax that compounds across every debugging session, every code review, every UI bug report you file.
The gap between what you see and what your AI understands is costing you hours.
Vision|Pipe Skips the Description
Capture the screen. Annotate however feels natural — speak it, type it, or draw it. Paste the full context — image, annotation, and metadata — into your LLM in one action.
No uploads. No integrations. No UI sprawl.
Just the Unix philosophy applied to AI vision: do one thing, do it well, compose it with everything else.
Every other tool
Capture
↓
Upload image
↓
Switch to LLM
↓
Type context
↓
Submit
Vision|Pipe
Capture + Comment
↓
Paste
↓
Submit
Five Steps. One Keystroke.
Screenshot or GIF goes here
Press your hotkey
One keystroke activates the capture overlay. No menus, no clicks. Default is Cmd+Shift+C (Mac) / Ctrl+Shift+C (Windows) — configurable to whatever you prefer.
Screenshot or GIF goes here
Select a region
Drag to capture any area of your screen. Full screen or surgical precision.
Screenshot or GIF goes here
Annotate your intent
Speak it, type it, or draw it. Voice, text, and markup — all at the moment of capture.
Screenshot or GIF goes here
Hit Enter
Screenshot + annotation + rich metadata are bundled into one clipboard payload.
Screenshot or GIF goes here
Paste into any LLM
GPT-4, Claude, Gemini, Codex — any AI that accepts images. Your LLM gets it right on the first try.
Captures What You Mean,
Not Just What You See
Every other screenshot tool captures pixels. Vision|Pipe captures intent.
Speak It
Record a voice note alongside your screenshot. Vision|Pipe transcribes it automatically using on-device Whisper and bundles the transcript with the image.
"This dropdown is rendering below the viewport on Safari — why?"
Type It
Add a written comment at the exact moment of capture. Your intent travels with the image as a single payload.
Why is this button misaligned in dark mode?
Draw It
Circle the problem. Highlight the element. Draw an arrow. A lightweight markup layer so your LLM knows exactly what to focus on.
All three, combined. Voice, text, and drawing can be used simultaneously. The full context is bundled into one clipboard payload. Paste once — your AI has everything.
Your LLM Gets the Full Picture
Vision|Pipe doesn’t just send a screenshot. It sends the complete context of where and what the image was captured from — automatically appended to every clipboard payload.
Spatial & Display
Window & Application
Browser Context
System
Captured via macOS Accessibility API and Windows UI Automation. No browser extension required.
Every Other Tool Was Built for Humans
Vision|Pipe was built for your AI.
| Tool | Built For | LLM-Native | Annotate at Capture | Rich Metadata |
|---|---|---|---|---|
| Playwright | Programmatic browser automation | Partial | ||
| Zight / CleanShot X | Sharing with humans | Post-capture only | ||
| Snagit | Documentation & tutorials | Post-capture only | ||
| macOS Screenshot | General capture | |||
| Vision|Pipe | Piping visual context into LLMs | Voice, text, drawing |
“If Playwright gives your test suite vision, Vision|Pipe gives you vision.”
Built the Right Way
Tauri
Lightweight and secure — not Electron. Minimal memory footprint.
Rust
Systems-level metadata capture, performance, and reliability.
Whisper
On-device transcription — no audio leaves your machine.
Built in the Open
Vision|Pipe is source-available and community-driven. The code is visible, forkable, and we welcome pull requests.
# Fork the repo
git checkout -b feature/your-feature
git commit -am 'Add your feature'
# Open a Pull Request
Questions? Open an issue or reach out on X @Vision_Pipe.
Stop Describing. Start Showing.
Free for personal use. Open for contributions. Built for developers.
Windows support coming soon
Everything It Does. Nothing It Doesn’t.
Lightweight
Minimal CPU and memory footprint — Tauri, not Electron
Fast
Capture and copy in milliseconds
Multi-modal
Voice, text, and drawing annotation in one tool
Auto-transcription
Voice notes converted to text on-device via Whisper
Rich metadata
Spatial, window, browser, and system context bundled automatically
Open source
See exactly what you're running
Cross-platform
Mac and Windows
Keyboard-first
One hotkey does everything
LLM-agnostic
Works with any AI that accepts images
No accounts
No API keys, no logins, no cloud dependency
Clipboard-native
Composes naturally with every LLM UI on the planet