# Unicorn AI Studio

> AI-Powered Podcast Generation, Voice Synthesis & Transcription Platform

## Overview

Unicorn AI Studio is a comprehensive AI audio platform that combines advanced text-to-speech, podcast generation, live transcription, and meeting recording capabilities. Built for creators, podcasters, content producers, and businesses who need professional-quality voice synthesis and transcription.

## Key Features

### 1. Multi-Speaker Podcast Generation
- **AI Script Generation**: Automated script creation using Gemma 4 LLM
- **Multiple TTS Models**: Quickhorn (0.5B, fast/local), Silvermane 1.5B, Silvermane Large, Silvermane 7B
- **Up to 4 Speakers**: Natural multi-speaker conversations with voice variety
- **Singing Capability**: AI voices can sing when prompted with ♪ notation
- **18+ Voice Options**: Multiple languages including English, Spanish, Chinese, Arabic, Hindi, Tamil
- **Advanced Controls**: CFG scale, inference steps, temperature, top-P sampling

### 2. Live Transcription
- **Real-Time Speech-to-Text**: Instant transcription using faster-whisper
- **Push-to-Talk Interface**: Space bar or click-to-record
- **Local Processing**: Privacy-first, runs on your device
- **Cloud Fallback**: HuggingFace Whisper large-v3-turbo for backup

### 3. Call & Meeting Recording
- **Multi-Format Support**: WAV, MP3, M4A, FLAC, OGG
- **Accurate Transcription**: Powered by OpenAI Whisper models
- **AI Summarization**: Automatic extraction of key points, action items, decisions
- **Export Options**: Download transcripts (.txt, .json) and summaries (.md)

### 4. Audio Library
- **Automatic Storage**: All generated content saved securely
- **In-Browser Playback**: Listen without downloading
- **Organization**: Filter by type (podcast, TTS, recording)
- **Download & Delete**: Full control over your audio files

## Technology Stack

### AI Models
- **TTS**: Unicorn AI Studio model family (Quickhorn 0.5B, Silvermane 1.5B, Silvermane Large, Silvermane 7B)
- **ASR**: faster-whisper (local), OpenAI Whisper (cloud fallback)
- **LLM**: Ollama Gemma 4 (e4b local, 31b-cloud fallback)

### Infrastructure
- **Backend**: Python FastAPI, SQLite, asyncio
- **Frontend**: Vanilla JavaScript, WebSocket, modern CSS
- **Cloud**: HuggingFace Spaces (A10G GPU) for Pro TTS
- **Deployment**: Works on macOS (MPS), Windows (CUDA), Linux (CPU/CUDA)

## Pricing Tiers

### Free - $0/month
- 10 minutes podcast generation
- 20 minutes transcription
- 5 AI script generations
- Quickhorn (0.5B) model only
- Perfect for trying the platform

### Bring Your Own - $9/month
- 30 minutes podcast generation
- 60 minutes transcription
- 20 AI script generations
- Use your own API keys (HuggingFace, OpenAI, Ollama)
- Cost control and flexibility

### Starter - $14.99/month
- 60 minutes podcast generation
- 120 minutes transcription
- 50 AI script generations
- Quickhorn (0.5B) model
- Ideal for regular creators

### Pro - $29/month
- 300 minutes podcast generation
- 500 minutes transcription  
- 200 AI script generations
- Access to Silvermane 1.5B, Silvermane Large, Silvermane 7B
- Priority processing
- Best for professionals and businesses

**Note**: Model complexity affects usage: Silvermane Large (1.5x tokens), Silvermane 7B (2x tokens). More speakers increase processing costs (~20% per additional speaker).

## Use Cases

### Content Creators
- Generate podcast episodes without recording studios
- Create audiobooks and audio stories
- Produce multilingual content with consistent voices
- Test scripts before final recording

### Businesses
- Create training materials and presentations
- Generate product demos and explainer videos
- Transcribe and summarize meetings automatically, keep data on device. 
- Produce marketing content at scale

### Developers
- API integration for voice synthesis
- Automated content generation pipelines
- Voice assistant prototyping
- Accessibility features for applications

### Educators
- Create educational podcast series
- Generate language learning materials
- Transcribe lectures and presentations
- Produce accessible course content

## Technical Requirements

### System Requirements
- **macOS**: macOS 12+ (Monterey), Apple Silicon (M1/M2/M3/M4/M5 and more) or Intel with 8GB+ RAM
- **Windows**: Windows 10/11, NVIDIA GPU (CUDA 11.8+) or CPU with 16GB+ RAM
- **Linux**: Ubuntu 20.04+, CUDA-capable GPU or 16GB+ RAM

### Browser Support
- Chrome/Edge 90+
- Firefox 88+
- Safari 14+
- Modern browsers with WebSocket and Web Audio API support

## API & Integration

### RESTful API Endpoints
- `/api/generate` - Podcast generation
- `/api/transcribe` - Audio transcription
- `/api/summarize` - Meeting summarization
- `/api/script` - AI script generation
- `/ws/transcribe` - WebSocket live transcription
- `/ws/stream-tts` - WebSocket streaming TTS

### Authentication
- JWT-based authentication
- API key support for BYOK tier
- Encrypted storage for user credentials

## Privacy & Security

- **Local Processing**: Free tier runs entirely on your device
- **Encrypted API Keys**: User-provided keys encrypted at rest
- **No Training Data**: Your content is never used for model training
- **Secure Storage**: SQLite with bcrypt password hashing
- **HTTPS Required**: SSL/TLS encryption for all communications

## Getting Started

1. **Download**: Get Unicorn AI Studio for macOS or Windows
2. **Install**: Run the installer and follow setup wizard
3. **Sign Up**: Create a free account (no credit card required)
4. **Generate**: Start creating podcasts or transcribing audio immediately
5. **Upgrade**: Move to Pro when you need cloud models and higher limits

## Links

- **Website**: https://www.unicornstudio.ca
- **Documentation**: https://www.unicornstudio.ca/llms.txt
- **Support**: info@coreledger.ca
- **GitHub**: https://github.com/KELVI23/Unicorn-ai-studios

## Company

Powered by Coreledger Technologies Inc. - Building the future of AI-powered audio production.

---

*Last Updated: April 2026*
*Version: 1.0.0*