OpenVision

Apple Vision Pro-style spatial computing, running entirely in your browser. No headset, no app, no install. Just a camera.

Live: openvision.vercel.app (connect to Vercel)

What is OpenVision?

OpenVision simulates the core interaction paradigms of Apple Vision Pro using only a webcam and a browser. Eye tracking, hand tracking, pinch gestures, spatial panels. All running on-device via WebGL and WebAssembly. Nothing ever leaves your browser.

Why it matters

Apple Vision Pro demos require a $3,500 headset and an Apple Store appointment. OpenVision lets anyone experience spatial computing from any device with a camera: Windows, Mac, Chromebook, anything.

Features

VisionWeb (`/vision`)

Eye tracking powered by WebGazer.js (Brown University).

9-point calibration (matches official WebGazer methodology: 5 clicks per dot, 45 total samples)
Real-time gaze cursor with Kalman filter smoothing
Face detection feedback during calibration (green = detected, red = not)
Dwell-to-click: look at any target for 1.2 seconds to activate
Spatial panel system: panels float in 3D space, draggable, closeable
Pinch gesture integration (index + thumb close = click)
Hand tracking via MediaPipe (scroll, drag, interact)

HandsWeb (`/hands`)

Hand tracking powered by MediaPipe Hands (Google): 21 landmarks per hand at 30fps+.

5 interactive modes:

Mode	What it does
Scroll	Pinch and move your hand up or down to scroll a real content feed. Trackpad still works too.
Particles	Every fingertip emits glowing particles with gravity. Z-depth controls energy.
Draw	Point with index finger to paint smooth bezier curves. Open palm to clear.
Bubbles	Iridescent bubbles float up. Pop them with your fingertips.
Portal	Hands glow through a pulsing void. Fingertips trail upward particles.

Real-time gesture recognition: Fist, Open, Point, Peace, Thumbs Up, Pinch, and more. Labels appear live on wrists.

OpenVision as a toolkit

The spatial primitives that power this site are importable. See lib/openvision/README.md for full usage.

import {
  PinchDetector, // core: thumb + index pinch state machine
  classifyGesture, // core: 9-class gesture classifier
  useHandTracking, // react: MediaPipe Hands wrapped as a hook
  useGazeTracking, // react: WebGazer wrapped as a hook
  usePinchScroll, // react: bind pinch dragging to a scrollable element
  useDwellClick, // react: stare to click
  GazeCursor, // react: floating cursor + dwell ring
  SpatialPanel, // react: draggable glass panel with gaze focus
} from "@/lib/openvision";

core/ has zero React and zero deps. react/ wraps it for Next.js / React 19 apps.

Tech Stack

Layer	Technology
Framework	Next.js 16 (App Router)
Language	TypeScript 5
Eye Tracking	WebGazer.js 2.1.0 (Brown University)
Hand Tracking	MediaPipe Hands v0.4 (Google)
Pinch Detection	Custom `PinchDetector` class (Verlet-smoothed, hysteresis thresholds)
Styling	Tailwind CSS v4
Deployment	Vercel

Architecture

WebGazer Integration

WebGazer.js uses TF.js TFFacemesh for face landmark detection, then trains a ridge regression model mapping face features to screen coordinates. Key implementation details:

saveDataAcrossSessions(false): prevents stale data from prior sessions corrupting the model
setRegression('ridge'): correct API name in 2.1.0 (not 'ridgeReg')
applyKalmanFilter(true): built-in smoothing, better than manual EMA
Auto click-recording stays ON during calibration. This is how WebGazer is designed
removeMouseEventListeners() after calibration: the only correct way to stop online learning in 2.1.0

PinchDetector (`hooks/usePinch.ts`)

Custom class with:

Normalized thumb-index distance ratio (relative to palm scale, so it works at any distance from camera)
Separate enter/exit thresholds (hysteresis prevents jitter at the boundary)
State machine: idle → pinching → holding | dragging → released → idle
EMA smoothing on center position (alpha = 0.4)
Drag deadzone (0.012 normalized units) to prevent accidental drags on pinch

MediaPipe Hands

Loaded via CDN script tag. More reliable than ES module import for WASM-based libraries. Avoids CDN WASM/JS version mismatch. Results fire at camera frame rate (~30fps). Hand landmarks are normalized 0-1 relative to the video frame.

Project Structure

openvision/
├── app/
│   ├── page.tsx              # Landing: links to /vision and /hands
│   ├── vision/page.tsx       # Lazy-loads VisionWeb (avoids SSR browser API crash)
│   ├── hands/page.tsx        # Lazy-loads HandsWeb
│   └── layout.tsx
├── components/
│   ├── VisionWeb.tsx         # Eye tracking + spatial panel interface
│   ├── HandsWeb.tsx          # Hand tracking sandbox (4 modes)
│   └── SpatialPanel.tsx      # Draggable spatial panel primitive
└── hooks/
    └── usePinch.ts           # PinchDetector class

Getting Started

npm install
npm run dev

Open http://localhost:3000.

Requires: HTTPS or localhost (browser camera API requires secure context). On production, Vercel provides HTTPS automatically.

Calibration Guide

For best eye tracking accuracy:

Sit 50-70cm from your screen (arm's length)
Keep your head still during calibration
Watch the face preview in the bottom-right corner. The box must be green (face detected)
Look directly at each dot before clicking. Keep your eyes on the dot, not the cursor.
Click each dot 5 times without moving your head

Poor lighting and head movement during calibration are the primary causes of inaccuracy.

Roadmap

Next modes planned for HandsWeb:

WebGL Fluid: Port PavelDoGreat/WebGL-Fluid-Simulation. Fingertips = fluid injectors.
SDF Metaballs: Fullscreen GLSL shader. Fingertips merge like liquid.
Theremin: Tone.js audio synthesis. Hand height = pitch, pinch = volume.
Cloth Simulation: Verlet grid. Hand grabs and tears fabric.
3D Particle Morph: Three.js. 60k+ GPU particles morphing between shapes.
Falling Sand: Cellular automata. Pour sand, water, fire, lava with gestures.

Privacy

No data ever leaves your browser. Camera feed is processed entirely on-device via WebAssembly. No server, no storage, no analytics.

All glory to God! ✝️❤️

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
components		components
hooks		hooks
lib/openvision		lib/openvision
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenVision

What is OpenVision?

Why it matters

Features

VisionWeb (`/vision`)

HandsWeb (`/hands`)

OpenVision as a toolkit

Tech Stack

Architecture

WebGazer Integration

PinchDetector (`hooks/usePinch.ts`)

MediaPipe Hands

Project Structure

Getting Started

Calibration Guide

Roadmap

Privacy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenVision

What is OpenVision?

Why it matters

Features

VisionWeb (/vision)

HandsWeb (/hands)

OpenVision as a toolkit

Tech Stack

Architecture

WebGazer Integration

PinchDetector (hooks/usePinch.ts)

MediaPipe Hands

Project Structure

Getting Started

Calibration Guide

Roadmap

Privacy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

VisionWeb (`/vision`)

HandsWeb (`/hands`)

PinchDetector (`hooks/usePinch.ts`)

Packages