Ready to unleash your creative potential? StoryLLM is a revolutionary desktop application that empowers you to generate unlimited AI-powered videos and images using the resources of your own computer.
If you're tired of the limitations, credit systems, and privacy concerns associated with cloud-based services, you've come to the right place. With StoryLLM, you are in complete control of your creative process! From narrative story videos to stunning slideshows, eye-catching images to social media content – create it all without per-use fees, right on your own hardware.
This documentation is designed to guide you through everything from installing StoryLLM using our automated scripts to mastering its most advanced features. Let's get started and create something amazing together!
System Requirements
To ensure the best performance and a smooth experience with StoryLLM, your system should meet the following requirements.
Important Note: Optimal performance and compatibility are achieved with Python 3.10 and NVIDIA CUDA 11.8. The automated setup aims to install these.
Operating System: Windows 10/11 (64-bit) or Linux (Modern 64-bit, e.g., Ubuntu 20.04+).
Graphics Card (GPU):
Strongly Recommended: NVIDIA GPU with CUDA 11.8 support.
CPU Only: Possible, but image generation will be extremely slow.
AMD/Intel GPUs: Not directly supported for hardware acceleration; will use CPU.
Python Version (Handled by Setup): Python 3.10.x.
Processor (CPU): Modern multi-core CPU (Intel i5/Ryzen 5 or better recommended).
RAM (Memory): Minimum 8GB, **16GB+ Recommended**.
Disk Space: 1-2GB for app + significant space for models (can be 5-20GB+ each) + project files. SSD Recommended.
Other Software (Handled by Setup): Conda (Miniconda recommended), FFmpeg.
Internet Connection: Required for setup, activation, model downloads, updates, cloud APIs, YouTube uploads. Offline use possible for local generation after setup.
Installation (Automated Setup via Conda)
This guide details the recommended setup process using the automated scripts provided in your download. This method handles dependency installation automatically using Conda.
Before You Start
Platforms: Tested on Windows 10/11 & common Linux distros (e.g., Ubuntu).
GPU: Image generation is **very slow** without a CUDA-enabled NVIDIA GPU (8GB+ VRAM recommended).
Audio: Audio generation (TTS) is generally fast, even without a powerful GPU. Tested stable for 120+ hours.
FFmpeg Included: The setup script installs FFmpeg via Conda.
Follow these steps exactly. No manual library installation needed.
0
Install Miniconda (If you don't have Conda)
Conda manages software environments. Download and install Miniconda for your OS (Windows/Linux). Follow the installer steps.
Crucial: After installation, **close and reopen** your command line tool (Terminal on Linux, use 'Anaconda Prompt' on Windows).
1. Download Your Project Files
Go to your StoryLLM Dashboard. Find an **active license** and click its "Download" button (). Save the `.zip` file.
2. Extract the Downloaded Archive
Right-click the `.zip` file -> "Extract All..." (or use 7-Zip/unzip). Extract it to a simple path (e.g., `C:\StoryLLM` or `~/StoryLLM`). This creates a `StoryLLM` folder.
3. Run the Automated Setup Script
This installs all required software. Navigate **inside** the extracted `StoryLLM` folder.
Windows: Find `setup_windows.bat`. **Double-click** it. Approve any security prompts. A command window will show progress.
Linux: Open a **Terminal** inside the `StoryLLM` folder. Run:
bash setup_linux_macos.sh
The script automatically installs Python 3.10, PyTorch (with CUDA 11.8 if possible), FFmpeg, and other libraries into a dedicated `storyllm_env` environment.
Installation can take several minutes. Wait for it to complete.
4. Configure API Keys (Especially Hugging Face Token)
Crucial Step: You need a Hugging Face token to download most image generation models.
Go into the `StoryLLM` folder, then into the `json` subfolder.
Open `config.json` with a text editor (Notepad, VS Code, etc.).
Find the `huggingface` key within the `api_keys` section:
"api_keys": {
...
"huggingface": "YOUR_HUGGINGFACE_TOKEN_HERE" <-- Replace this
}
Windows: Go back to the main `StoryLLM` folder. **Double-click** `launch_storyllm.bat`.
Linux: In the **Terminal** (still inside `StoryLLM` folder), run:
bash launch_storyllm.sh
The application will verify your license and start. A URL like `http://127.0.0.1:7860` will appear in the console. Open this URL in your browser.
6. Updating StoryLLM
Download the new `.zip` (Step 1), extract overwriting old files (Step 2), and run the **Setup Script** again (Step 3). Your `config.json` should be preserved if you don't manually delete the `json` folder, but backing it up first is wise. Then Launch (Step 5).
What is StoryLLM?
StoryLLM is a desktop application for Windows and Linux that enables users to create AI-powered videos (narrated stories, slideshows) and images by leveraging their own computer hardware.
Its core philosophy is to provide users with full control and privacy over their content creation. While most AI video tools operate in the cloud, often with per-use fees or credit systems, StoryLLM performs the main AI tasks (especially image generation and optional local TTS/text generation) directly on your machine.
Key Advantages:
Unlimited Local Generation: No limits on the number or duration of videos, images, or local TTS generations created using your own hardware. StoryLLM software doesn't impose restrictions like "You have 10 video credits left."
Data Privacy: When using local models, your text inputs, prompts, and generated assets do not leave your computer.
Cost Control: Beyond the software license (subscription), there are no additional costs for local generation (excluding optional cloud API usage).
Offline Capability: After setup and model downloads, core generation functions using local models can work without an internet connection.
Flexibility: Freedom to choose which AI models to use (local Ollama vs. cloud OpenAI/Gemini; local Kokoro TTS vs. cloud ElevenLabs).
Hardware Utilization: Fully utilizes the power of your computer, especially your NVIDIA GPU.
StoryLLM offers content creators, marketers, educators, and hobbyists the ability to produce high-quality AI content on their own terms.
Local vs. Cloud Processing
StoryLLM offers a hybrid approach, combining both local (running on your computer) and cloud-based (via external APIs) AI capabilities. You choose which options to use based on your needs and configuration.
Local Processing (Default & Recommended)
Operations happen entirely on your machine. Requires no internet after setup/download. Your data stays private.
Text/Prompt Generation: Uses local Ollama models (if installed and configured).
Video Assembly: Uses FFmpeg locally.
Pros: Privacy, unlimited use, offline capability, no extra generation costs.
Cons: Requires capable hardware (GPU), Ollama setup is separate, local model variety might be less than cloud.
Cloud Processing (Optional APIs)
Requires internet and your API keys in `config.json`. Subject to provider limits/pricing.
Text/Prompt Generation: OpenAI API (GPT models) or Google Gemini API.
Text-to-Speech (TTS): ElevenLabs API (more voices/languages).
YouTube Upload & Metadata: Uploads video and can use OpenAI/Gemini for metadata/translation.
Pros: Access to advanced models (GPT-4), wider TTS options (ElevenLabs).
Cons: Requires internet, API keys, potential extra costs from providers, data sent to cloud provider.
Configure your preferred services via the `config.json` file or application settings.
Unlimited Local Generation
StoryLLM allows unlimited local generation. This means no software-imposed limits on videos, images, or local TTS created using your computer's resources (Stable Diffusion, Kokoro TTS, local Ollama). Your hardware capabilities and storage are the only limits.
Note: This applies only to local operations. Using optional cloud APIs (OpenAI, Gemini, ElevenLabs) is subject to the pricing and limits of those external services.
How to Use StoryLLM
StoryLLM offers different modes for content creation:
Using the Story Generator
Turn written narratives into dynamically synced videos.
Input Text: Paste your story/script.
Select Voice: Choose Kokoro (local) or ElevenLabs (cloud API).
Select LLM for Prompts: Choose Ollama (local), OpenAI (cloud API), or Gemini (cloud API).
Configure Image Settings: Select Image Model (Stable Diffusion), Resolution, Steps, CFG, Seed, etc. Ensure GPU/CUDA is selected if available.
(Optional) Add Background Music: Select an audio file and set volume.
Generate: The app processes text->audio->prompts->images->video.
Create slideshows with consistent timing per image.
Input Narration Text: Enter the full voiceover script.
Select Voice: Choose Kokoro or ElevenLabs.
Enter Image Prompt: Provide ONE main prompt for all images.
Set Image Count & Duration: Specify number of images and seconds per image.
Configure Image Settings: Select Model, Resolution, Steps, CFG. Seed often increments automatically for variety.
(Optional) Add Background Music.
Generate: Creates one audio file, generates images based on the single prompt (varying seed), combines with fixed duration per image.
Review & Edit: Use the Segment Editor to regenerate specific images if needed.
Export/Upload.
Using the Image Generator
Generate standalone still images.
Enter Prompt(s): Describe the desired image(s). Add negative prompts.
Set Parameters: Number of images, Model, Resolution, Steps, CFG, Seed mode.
Generate: Creates the specified number of images and saves them (usually PNG).
Editing Your Creations (Segment Editor)
Refine individual image/audio segments after initial generation in Story/Slide modes.
Features:
Browse segments visually.
Preview individual image and audio.
View generation parameters (prompt, seed, model, etc.) used for each image.
Edit the Prompt for a specific segment's image.
Regenerate Individual Images using the edited (or original) prompt, a new seed, and optionally a different model.
Rebuild Video quickly using the updated segments.
This allows fine-tuning without regenerating the entire project.
Configuration & API Integrations
StoryLLM uses configuration files to manage settings like API keys and model choices.
Configuration File (`config.json`) Explained
The core settings for StoryLLM are stored in config.json, located in the StoryLLM/json/ folder after extraction. You can edit this file with a text editor or use the "My Configuration" page in the user dashboard.
Here's a breakdown of its structure and purpose:
1. `api_keys` Section
Contains your secret keys for external cloud services. **Keep these confidential!**
"openai": "" - Your key from OpenAI. Needed only if you select OpenAI for text/translation tasks in the app. Leave blank otherwise.
"elevenlabs": "" - Your key from ElevenLabs. Needed only if you select ElevenLabs as the Audio Engine. Leave blank otherwise.
"gemini": "" - Your key from Google AI Studio. Needed only if you select Gemini for text/translation tasks. Leave blank otherwise.
"huggingface": ""Mandatory - Your Hugging Face Hub token. **This is essential.**
Why? Required to download most image generation models (Stable Diffusion). Without it, image generation will fail with errors.
Click "New token", name it (e.g., "StoryLLM"), choose Role: **`read`**.
Generate & **Copy** the `hf_...` token.
Paste it between the quotes for `"huggingface"` in your `config.json`.
2. `api_links` Section
URLs for connecting to local services.
"ollama_chat": "http://localhost:11434/api/chat" - Endpoint for your local Ollama instance. Only needed if you run Ollama locally and select it in the app. The default is usually correct.
3. `llm_models` Section
Default models used by the selected API provider for specific tasks.
openai / ollama / gemini objects: Group settings by provider.
"chat_default": "..." - Default model for generating image prompts. Ensure the Ollama model name matches one you've pulled (e.g., `ollama run gemma2:9b`).
"translation_default": "..." - Default model for translating YouTube metadata.
Note: If a default model is invalid or unavailable (e.g., not pulled in Ollama), the corresponding feature may fail for that provider.
Remember: After manually editing `config.json`, save the file and restart StoryLLM for changes to take effect. Ensure the file remains valid JSON.
Using API Integrations
Leverage powerful cloud services by integrating your API keys via the config.json file or the application's settings.
OpenAI Integration
Purpose: Used for generating high-quality image prompts (Story Generator), generating YouTube titles/descriptions/tags, and translating metadata.
Requires: An OpenAI API key in `config.json`.
Models: Access models like `gpt-4o-mini`, `gpt-4`, etc.
Requires: Connecting your YouTube account(s) via OAuth within StoryLLM; optionally configured LLM APIs for metadata tasks.
Note: Uses official YouTube Data API v3, subject to their quotas.
AI Models & Performance
Model Configuration (`models.json`)
StoryLLM manages available Stable Diffusion models via models.json (or similar). It maps names to Hugging Face repo IDs or local paths and provides notes (especially VRAM needs).
GPU Video RAM (VRAM) is critical for image generation models.
~4-8GB: Best for SD 1.5 models (e.g., Realistic Vision) at lower resolutions (512x512). SDXL likely too slow/OOM.
~8-16GB+: Needed for SDXL models (e.g., Juggernaut XL) and higher resolutions (1024x1024). 8GB is minimum practical for SDXL, 12GB+ better.
~16GB++: Recommended for largest models (e.g., SD 3 Large) or complex workflows.
Running models requiring more VRAM than available causes Out-of-Memory (OOM) errors. Reduce resolution or use a less demanding model.
Optimizing Performance
Best:** Use a powerful NVIDIA GPU (RTX 30xx/40xx) with sufficient VRAM.
Drivers/CUDA:** Use recommended NVIDIA drivers & CUDA 11.8.
Python:** Use Python 3.10.x.
RAM:** 16GB+ system RAM recommended.
Disk:** Use an SSD.
Settings:** Lower resolution or inference steps speed up generation.
Precision:** Use FP16 model variants if available (less VRAM, often faster).
Resources:** Close other heavy applications during generation.
Using Custom Models (Stable Diffusion)
Use your own .ckpt or .safetensors models.
Download model file.
Place it in StoryLLM's designated Stable Diffusion model directory.
Edit the model configuration file (e.g., `models.json`) to add an entry for your model (name, path, notes including base type SD1.5/SDXL and VRAM estimate).
Restart StoryLLM. Select your model in the UI.
Check the license of any custom models you download regarding commercial use rights.
Prompting Guide
Prompt Configuration (`prompts.json`)
Stores common rules and example prompt templates used by LLMs for image description generation (Story Generator). You can customize or add your own templates here.
{
"common_rules": "\\n1. Be written in English...",
"example_prompts": {
"Art Styles & Mediums": [
{ "title": "🎌 Anime (Generic)", "content": "Create ONLY anime style... Each prompt must:{common_rules}..." }
// ... more
]
}
}
Using Prompt Templates (Story Generator)
Select a template (e.g., "Anime", "Cinematic") to guide the LLM in generating image prompts that match a specific style for each story segment. Creates consistency. "AI Generated" uses a generic approach.
Advanced Controls (Steps, CFG, Seed)
Inference Steps: Number of refinement steps (e.g., 20-30). More steps = more detail, slower. Fewer = faster, less refined.
Guidance Scale (CFG): How strictly to follow the prompt (e.g., 7). Higher = stricter adherence, Lower = more creative freedom.
Seed: Controls randomness. Use `Fixed` for reproducibility, `Random` for variety, `Incrementing/Decrementing` for sequences.
Other Features
Supported Languages (TTS)
Kokoro (Local): English (US/UK), Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese, etc.
ElevenLabs (Cloud API): Wider range including Turkish, German, Russian, Korean, Polish, Arabic, etc.
YouTube metadata can also be translated via LLM APIs.
Background Music
Add `.mp3` or `.wav` background music in Story/Slide modes. Adjust volume relative to narration.
Needed for: Setup, activation, license checks, updates, model downloads, cloud API use, YouTube uploads.
NOT Needed for: Core generation using already downloaded local models (Stable Diffusion, Kokoro, Ollama) once activated.
Data Privacy
Local Processing: Your inputs and outputs stay on your computer. Cloud APIs: Data sent to OpenAI/Google/ElevenLabs is subject to their policies. YouTube: Uploads go to Google/YouTube. Licensing: Minimal data (key, status) sent for validation. See our Privacy Policy.
Licensing & Support
Licensing & Updates
Active subscription needed for full use & updates. License typically for one computer. See License Agreement and Refund Policy.
Support & Contact
Check this documentation & FAQ first. Email support available for active subscribers at [email protected]. Priority support for higher tiers.
Frequently Asked Questions (FAQ)
General Questions & Features
Desktop app (Win/Linux) for unlimited local AI video/image generation using your hardware. Focuses on control & privacy. See What is StoryLLM?.
See System Requirements. Key: Win/Linux, NVIDIA GPU (8GB+ VRAM recommended, CUDA 11.8), Python 3.10, 16GB+ RAM, SSD.
Yes, for local models (Stable Diffusion, Kokoro, Ollama). Your hardware is the limit. Optional cloud APIs (OpenAI, Gemini, ElevenLabs) have their own external limits/costs. See Unlimited Generation.
Story: Unique image per audio segment, synced duration. Slide: Multiple images from one prompt, fixed duration per image. Image: Standalone still images.
Yes, in Story/Slide modes. You can upload an audio file and control its volume. See Background Music.
Narrated stories, explainers, tutorials, product slideshows, marketing content, social media videos, visual galleries, concept art, etc.
Connect your YouTube account via OAuth. Upload videos directly, optionally use LLM APIs (OpenAI/Gemini) to generate/translate metadata, set privacy, and schedule uploads. See YouTube Integration.
This documentation is the main guide. Video tutorials and more resources are planned for our website.
Key differences: Local processing (privacy/offline), true unlimited local generation, uses your hardware, model flexibility (local/cloud), subscription cost vs. per-item fees.
Cancel anytime via your account on our website. Cancellation stops future billing. Access continues until your paid period ends. See Refund Policy.
The software may enter limited mode or cease full function. An active subscription is needed for full features/updates. Your local files remain. See Licensing & Updates.
Currently, no free trial is offered. Review documentation, features, and requirements to evaluate. Due to the digital nature, refunds are limited (Refund Policy).
Technical Questions & Usage
Text: Local Ollama (Llama 3, Gemma etc) OR Cloud OpenAI/Gemini. Image: Local Stable Diffusion (SD1.5, SDXL, custom). Voice: Local Kokoro OR Cloud ElevenLabs. See Models & APIs.
Yes, use the Segment Editor to edit the prompt for a specific image, regenerate only that image, and then rebuild the video.
Local Kokoro: English, Japanese, Chinese, Spanish, French, etc. Cloud ElevenLabs API: Wider range including Turkish, German, Russian, etc. (requires key/plan). See Languages.
Depends on the AI model/service license you use. Check the license for specific Stable Diffusion models (especially custom ones) or the terms of service for OpenAI/Gemini/ElevenLabs if using their APIs. StoryLLM itself doesn't restrict usage beyond those underlying licenses.
Active subscription gives usage rights (typically 1 PC) & access to updates. Periodic online checks validate the license. See Licensing & Updates.
No. Needed for setup, activation, model downloads, updates, cloud APIs, YouTube uploads. Core local generation works offline after setup/downloads. See Internet Req.
Highly depends on your GPU (NVIDIA recommended), VRAM, resolution, model complexity. Seconds per image with good GPU, minutes per image on CPU. TTS/assembly is faster. See Optimizing.
Videos: MP4 (H.264/AAC). Images: PNG. See Output Formats.
Local Models: Data stays on your PC. Cloud APIs: Data sent to provider (OpenAI/Google/ElevenLabs) per their policy. YouTube: Data sent to Google/YouTube. Minimal license data sent to us. See Data Privacy.
Yes, `.ckpt` and `.safetensors` are supported. Place them in the designated model folder and update the model config file (e.g., `models.json`) to make them selectable. See Custom Models.
During active generation (especially images), yes, it uses significant GPU/CPU resources. Performance impact is minimal when idle. Avoid running other heavy tasks during generation. See Optimizing.
Typically one primary computer per license. Transfer may require deactivation or contacting support ([email protected]). See Licensing and your License Agreement.