Second Sight
Team led by an EA Technical Director (UBC), featuring uOttawa MEng AI/Full-Stack (React, Node, Kubernetes) and robust Java/Docker backend builders.
YouTube Video
Project Description
Core Functionality & Technical Complexity
Second Sight is a multimodal AI assistant designed as a “Reality OS” for blind and visually impaired users. It transforms a smartphone browser into an intelligent agent that can see (via camera + Claude vision), know (via Tavily web search), and remember (via persistent user memory). The ElevenLabs Conversational AI acts as the central orchestrator—users speak naturally, and the agent intelligently decides which tool to invoke: analyzing surroundings, searching for information, or recalling saved memories.
The prototype demonstrates stable real-time interaction with four client-side tools orchestrated through voice commands. Technical complexity includes: WebRTC camera streaming, base64 image processing for vision AI, authenticated memory persistence using Clerk’s user metadata (eliminating external database dependencies), and graceful fallback handling when APIs fail.
Innovation & Creativity
Unlike typical chatbots, Second Sight is an agentic interface—it doesn’t just answer questions, it perceives the world and takes action. The innovation lies in combining accessibility-first design with multimodal AI: a blind user can point their phone at a medicine bottle and ask “What is this?” or say “Remember my doctor’s appointment is Tuesday” without touching the screen. This hands-free, eyes-free interaction paradigm represents a fundamentally new way to experience AI.
Real-World Impact
Over 285 million people worldwide live with visual impairment. Second Sight empowers them with independent access to visual information, real-time knowledge, and personal memory assistance—capabilities previously requiring human assistance or expensive specialized hardware.
Theme Alignment: Browsers, Voices, Clouds, and Tools as Cohesive Agents
Second Sight embodies the hackathon theme by unifying:
Browser: The entire experience runs in-browser as a Next.js SPA—no native app required. Camera, audio, and UI all leverage modern web APIs.
Voice: ElevenLabs Conversational AI provides the voice interface. Users speak; the agent listens, thinks, and responds naturally.
Cloud: Anthropic Claude (vision), Tavily (search), and Clerk (auth + memory) power the backend intelligence via serverless API routes.
Tools: Four client-side tools (getVisualContext, webSearch, saveMemory, readMemory) are orchestrated by the ElevenLabs agent, demonstrating true tool-use AI.
Technologies Used
Category Technologies
Voice AI ElevenLabs Conversational AI, @11labs/react
Vision AI Anthropic Claude (claude-haiku-4-5)
Search Tavily API
Auth & Memory Clerk (@clerk/nextjs, user metadata for persistence)
Frontend Next.js 16 (App Router), React 19, TypeScript, Tailwind CSS
UI/UX Framer Motion, Lucide React, Glassmorphism design
Browser APIs WebRTC (getUserMedia), Canvas API, Web Audio
Team
Products & Tools
Additional Links
SecondSight github