Welcome to StoryLLM!

Ready to unleash your creative potential? StoryLLM is a revolutionary desktop application that empowers you to generate unlimited AI-powered videos and images using the resources of your own computer.

If you're tired of the limitations, credit systems, and privacy concerns associated with cloud-based services, you've come to the right place. With StoryLLM, you are in complete control of your creative process! From narrative story videos to stunning slideshows, eye-catching images to social media content – create it all without per-use fees, right on your own hardware.

This documentation is designed to guide you through everything from installing StoryLLM using our automated scripts to mastering its most advanced features. Let's get started and create something amazing together!

System Requirements

To ensure the best performance and a smooth experience with StoryLLM, your system should meet the following requirements.

Important Note: Optimal performance and compatibility are achieved with Python 3.10 and NVIDIA CUDA 11.8. The automated setup aims to install these.

Operating System: Windows 10/11 (64-bit) or Linux (Modern 64-bit, e.g., Ubuntu 20.04+).
Graphics Card (GPU):
- Strongly Recommended: NVIDIA GPU with CUDA 11.8 support.
- VRAM: Minimum 6GB (for basic models/resolution), **8GB+ Recommended** (for SDXL, higher resolutions), 12GB+ Ideal.
- CPU Only: Possible, but image generation will be extremely slow.
- AMD/Intel GPUs: Not directly supported for hardware acceleration; will use CPU.
Python Version (Handled by Setup): Python 3.10.x.
Processor (CPU): Modern multi-core CPU (Intel i5/Ryzen 5 or better recommended).
RAM (Memory): Minimum 8GB, **16GB+ Recommended**.
Disk Space: 1-2GB for app + significant space for models (can be 5-20GB+ each) + project files. SSD Recommended.
Other Software (Handled by Setup): Conda (Miniconda recommended), FFmpeg.
Internet Connection: Required for setup, activation, model downloads, updates, cloud APIs, YouTube uploads. Offline use possible for local generation after setup.

Installation (Automated Setup via Conda)

This guide details the recommended setup process using the automated scripts provided in your download. This method handles dependency installation automatically using Conda.

Before You Start

Platforms: Tested on Windows 10/11 & common Linux distros (e.g., Ubuntu).
GPU: Image generation is **very slow** without a CUDA-enabled NVIDIA GPU (8GB+ VRAM recommended).
Audio: Audio generation (TTS) is generally fast, even without a powerful GPU. Tested stable for 120+ hours.
FFmpeg Included: The setup script installs FFmpeg via Conda.

Follow these steps exactly. No manual library installation needed.

Install Miniconda (If you don't have Conda)

Conda manages software environments. Download and install Miniconda for your OS (Windows/Linux). Follow the installer steps.

Crucial: After installation, **close and reopen** your command line tool (Terminal on Linux, use 'Anaconda Prompt' on Windows).

1. Download Your Project Files

Go to your StoryLLM Dashboard. Find an **active license** and click its "Download" button (). Save the `.zip` file.

2. Extract the Downloaded Archive

Right-click the `.zip` file -> "Extract All..." (or use 7-Zip/unzip). Extract it to a simple path (e.g., `C:\StoryLLM` or `~/StoryLLM`). This creates a `StoryLLM` folder.

3. Run the Automated Setup Script

This installs all required software. Navigate **inside** the extracted `StoryLLM` folder.

Windows: Find `setup_windows.bat`. **Double-click** it. Approve any security prompts. A command window will show progress.

Linux: Open a **Terminal** inside the `StoryLLM` folder. Run:

bash setup_linux_macos.sh

The script automatically installs Python 3.10, PyTorch (with CUDA 11.8 if possible), FFmpeg, and other libraries into a dedicated `storyllm_env` environment.

Installation can take several minutes. Wait for it to complete.

4. Configure API Keys (Especially Hugging Face Token)

Crucial Step: You need a Hugging Face token to download most image generation models.

Go into the `StoryLLM` folder, then into the `json` subfolder.
Open `config.json` with a text editor (Notepad, VS Code, etc.).

Find the `huggingface` key within the `api_keys` section:

"api_keys": {
    ...
    "huggingface": "YOUR_HUGGINGFACE_TOKEN_HERE" <-- Replace this
}

Get your Hugging Face Token:
- Go to: huggingface.co/settings/tokens (Log in/Sign up).
- Click "New token". Name it (e.g., "StoryLLM_Access").
- Choose Role: **`read`**.
- Click "Generate".
- **Copy the token immediately** (it starts with `hf_...`).
Paste the token: Replace `"YOUR_HUGGINGFACE_TOKEN_HERE"` in `config.json` with your copied token (keep the quotes "...").
(Optional) Other Keys: If you use OpenAI, ElevenLabs, or Gemini, paste their respective keys, replacing the placeholders.
Save Changes: Save and close the `config.json` file.

See the `config.json` Explanation below for more details on other settings.

5. Launch StoryLLM

After setup and configuration:

Windows: Go back to the main `StoryLLM` folder. **Double-click** `launch_storyllm.bat`.

Linux: In the **Terminal** (still inside `StoryLLM` folder), run:

bash launch_storyllm.sh

The application will verify your license and start. A URL like `http://127.0.0.1:7860` will appear in the console. Open this URL in your browser.

6. Updating StoryLLM

Download the new `.zip` (Step 1), extract overwriting old files (Step 2), and run the **Setup Script** again (Step 3). Your `config.json` should be preserved if you don't manually delete the `json` folder, but backing it up first is wise. Then Launch (Step 5).

What is StoryLLM?

StoryLLM is a desktop application for Windows and Linux that enables users to create AI-powered videos (narrated stories, slideshows) and images by leveraging their own computer hardware.

Its core philosophy is to provide users with full control and privacy over their content creation. While most AI video tools operate in the cloud, often with per-use fees or credit systems, StoryLLM performs the main AI tasks (especially image generation and optional local TTS/text generation) directly on your machine.

Key Advantages:

Unlimited Local Generation: No limits on the number or duration of videos, images, or local TTS generations created using your own hardware. StoryLLM software doesn't impose restrictions like "You have 10 video credits left."
Data Privacy: When using local models, your text inputs, prompts, and generated assets do not leave your computer.
Cost Control: Beyond the software license (subscription), there are no additional costs for local generation (excluding optional cloud API usage).
Offline Capability: After setup and model downloads, core generation functions using local models can work without an internet connection.
Flexibility: Freedom to choose which AI models to use (local Ollama vs. cloud OpenAI/Gemini; local Kokoro TTS vs. cloud ElevenLabs).
Hardware Utilization: Fully utilizes the power of your computer, especially your NVIDIA GPU.

StoryLLM offers content creators, marketers, educators, and hobbyists the ability to produce high-quality AI content on their own terms.

Local vs. Cloud Processing

StoryLLM offers a hybrid approach, combining both local (running on your computer) and cloud-based (via external APIs) AI capabilities. You choose which options to use based on your needs and configuration.

Local Processing (Default & Recommended)

Operations happen entirely on your machine. Requires no internet after setup/download. Your data stays private.

Image Generation: Uses Stable Diffusion models via Diffusers (NVIDIA GPU accelerated).
Text-to-Speech (TTS): Uses Kokoro TTS (local voice synthesis).
Text/Prompt Generation: Uses local Ollama models (if installed and configured).
Video Assembly: Uses FFmpeg locally.

Pros: Privacy, unlimited use, offline capability, no extra generation costs.

Cons: Requires capable hardware (GPU), Ollama setup is separate, local model variety might be less than cloud.

Cloud Processing (Optional APIs)

Requires internet and your API keys in `config.json`. Subject to provider limits/pricing.

Text/Prompt Generation: OpenAI API (GPT models) or Google Gemini API.
Text-to-Speech (TTS): ElevenLabs API (more voices/languages).
YouTube Upload & Metadata: Uploads video and can use OpenAI/Gemini for metadata/translation.

Pros: Access to advanced models (GPT-4), wider TTS options (ElevenLabs).

Cons: Requires internet, API keys, potential extra costs from providers, data sent to cloud provider.

Configure your preferred services via the `config.json` file or application settings.

Unlimited Local Generation

StoryLLM allows unlimited local generation. This means no software-imposed limits on videos, images, or local TTS created using your computer's resources (Stable Diffusion, Kokoro TTS, local Ollama). Your hardware capabilities and storage are the only limits.

Note: This applies only to local operations. Using optional cloud APIs (OpenAI, Gemini, ElevenLabs) is subject to the pricing and limits of those external services.

How to Use StoryLLM

StoryLLM offers different modes for content creation:

Using the Story Generator

Turn written narratives into dynamically synced videos.

Input Text: Paste your story/script.
Select Voice: Choose Kokoro (local) or ElevenLabs (cloud API).
Select LLM for Prompts: Choose Ollama (local), OpenAI (cloud API), or Gemini (cloud API).
Configure Image Settings: Select Image Model (Stable Diffusion), Resolution, Steps, CFG, Seed, etc. Ensure GPU/CUDA is selected if available.
(Optional) Add Background Music: Select an audio file and set volume.
Generate: The app processes text->audio->prompts->images->video.
Review & Edit: Use the Segment Editor to refine images.
Export/Upload: Save MP4 or use YouTube Integration.

Using the Slide Generator

Create slideshows with consistent timing per image.

Input Narration Text: Enter the full voiceover script.
Select Voice: Choose Kokoro or ElevenLabs.
Enter Image Prompt: Provide ONE main prompt for all images.
Set Image Count & Duration: Specify number of images and seconds per image.
Configure Image Settings: Select Model, Resolution, Steps, CFG. Seed often increments automatically for variety.
(Optional) Add Background Music.
Generate: Creates one audio file, generates images based on the single prompt (varying seed), combines with fixed duration per image.
Review & Edit: Use the Segment Editor to regenerate specific images if needed.
Export/Upload.

Using the Image Generator

Generate standalone still images.

Enter Prompt(s): Describe the desired image(s). Add negative prompts.
Set Parameters: Number of images, Model, Resolution, Steps, CFG, Seed mode.
Generate: Creates the specified number of images and saves them (usually PNG).

Editing Your Creations (Segment Editor)

Refine individual image/audio segments after initial generation in Story/Slide modes.

Features:

Browse segments visually.
Preview individual image and audio.
View generation parameters (prompt, seed, model, etc.) used for each image.
Edit the Prompt for a specific segment's image.
Regenerate Individual Images using the edited (or original) prompt, a new seed, and optionally a different model.
Rebuild Video quickly using the updated segments.

This allows fine-tuning without regenerating the entire project.

Configuration & API Integrations

StoryLLM uses configuration files to manage settings like API keys and model choices.

Configuration File (`config.json`) Explained

The core settings for StoryLLM are stored in config.json, located in the StoryLLM/json/ folder after extraction. You can edit this file with a text editor or use the "My Configuration" page in the user dashboard.

Here's a breakdown of its structure and purpose:

1. `api_keys` Section

Contains your secret keys for external cloud services. **Keep these confidential!**

"openai": "" - Your key from OpenAI. Needed only if you select OpenAI for text/translation tasks in the app. Leave blank otherwise.
"elevenlabs": "" - Your key from ElevenLabs. Needed only if you select ElevenLabs as the Audio Engine. Leave blank otherwise.
"gemini": "" - Your key from Google AI Studio. Needed only if you select Gemini for text/translation tasks. Leave blank otherwise.
"huggingface": "" Mandatory - Your Hugging Face Hub token. **This is essential.**
- Why? Required to download most image generation models (Stable Diffusion). Without it, image generation will fail with errors.
- How to Get:
  1. Go to huggingface.co/settings/tokens (Log in/Sign up).
  2. Click "New token", name it (e.g., "StoryLLM"), choose Role: **`read`**.
  3. Generate & **Copy** the `hf_...` token.
  4. Paste it between the quotes for `"huggingface"` in your `config.json`.

2. `api_links` Section

URLs for connecting to local services.

"ollama_chat": "http://localhost:11434/api/chat" - Endpoint for your local Ollama instance. Only needed if you run Ollama locally and select it in the app. The default is usually correct.

3. `llm_models` Section

Default models used by the selected API provider for specific tasks.

openai / ollama / gemini objects: Group settings by provider.
"chat_default": "..." - Default model for generating image prompts. Ensure the Ollama model name matches one you've pulled (e.g., `ollama run gemma2:9b`).
"translation_default": "..." - Default model for translating YouTube metadata.
Note: If a default model is invalid or unavailable (e.g., not pulled in Ollama), the corresponding feature may fail for that provider.

Remember: After manually editing `config.json`, save the file and restart StoryLLM for changes to take effect. Ensure the file remains valid JSON.

Using API Integrations

Leverage powerful cloud services by integrating your API keys via the config.json file or the application's settings.

OpenAI Integration

Purpose: Used for generating high-quality image prompts (Story Generator), generating YouTube titles/descriptions/tags, and translating metadata.
Requires: An OpenAI API key in `config.json`.
Models: Access models like `gpt-4o-mini`, `gpt-4`, etc.
Get Key: Visit https://platform.openai.com/api-keys
Note: Subject to OpenAI's pricing and policies.

Google Gemini Integration

Purpose: Alternative for generating image prompts, YouTube metadata, and translations.
Requires: A Google Cloud project with Gemini API enabled & API key in `config.json`.
Models: Access models like `gemini-1.5-flash`, `gemini-1.5-pro`.
Get Key: Visit https://ai.google.dev/.
Note: Subject to Google Cloud's pricing and policies.

ElevenLabs TTS Integration

Purpose: High-quality, multi-lingual TTS voices.
Requires: An ElevenLabs API key in `config.json`.
Get Key: Visit https://elevenlabs.io/
Note: Based on ElevenLabs character quotas and subscription plans.

Ollama Integration (Local LLM)

Purpose: Use locally running LLMs (Llama 3, Gemma, Mistral, etc.) for prompts/metadata, keeping data private.
Requires: Ollama installed and running locally; desired models pulled via `ollama run model_name`; correct URL in `config.json`.
Note: Performance depends on your hardware. Runs offline.

YouTube Integration

Purpose: Upload videos, generate/translate metadata, schedule uploads.
Requires: Connecting your YouTube account(s) via OAuth within StoryLLM; optionally configured LLM APIs for metadata tasks.
Note: Uses official YouTube Data API v3, subject to their quotas.

AI Models & Performance

Model Configuration (`models.json`)

StoryLLM manages available Stable Diffusion models via models.json (or similar). It maps names to Hugging Face repo IDs or local paths and provides notes (especially VRAM needs).

{
  "Juggernaut XL V9": { "repo_id": "RunDiffusion/Juggernaut-XL-V9", "notes": "SDXL. VRAM: Med/High", "variant": "fp16" },
  "Realistic Vision V6": { "repo_id": "SG161222/Realistic_Vision_V6.0_B1_noVAE", "notes": "SD1.5 Photoreal. VRAM: Low/Med" },
  // ... more
}

Understanding VRAM Requirements

GPU Video RAM (VRAM) is critical for image generation models.

~4-8GB: Best for SD 1.5 models (e.g., Realistic Vision) at lower resolutions (512x512). SDXL likely too slow/OOM.
~8-16GB+: Needed for SDXL models (e.g., Juggernaut XL) and higher resolutions (1024x1024). 8GB is minimum practical for SDXL, 12GB+ better.
~16GB++: Recommended for largest models (e.g., SD 3 Large) or complex workflows.

Running models requiring more VRAM than available causes Out-of-Memory (OOM) errors. Reduce resolution or use a less demanding model.

Optimizing Performance

Best:** Use a powerful NVIDIA GPU (RTX 30xx/40xx) with sufficient VRAM.

Drivers/CUDA:** Use recommended NVIDIA drivers & CUDA 11.8.

Python:** Use Python 3.10.x.

RAM:** 16GB+ system RAM recommended.

Disk:** Use an SSD.

Settings:** Lower resolution or inference steps speed up generation.

Precision:** Use FP16 model variants if available (less VRAM, often faster).

Resources:** Close other heavy applications during generation.

Using Custom Models (Stable Diffusion)

Use your own .ckpt or .safetensors models.

Download model file.

Place it in StoryLLM's designated Stable Diffusion model directory.

Edit the model configuration file (e.g., `models.json`) to add an entry for your model (name, path, notes including base type SD1.5/SDXL and VRAM estimate).

Restart StoryLLM. Select your model in the UI.

Check the license of any custom models you download regarding commercial use rights.

Prompting Guide

Prompt Configuration (`prompts.json`)

Stores common rules and example prompt templates used by LLMs for image description generation (Story Generator). You can customize or add your own templates here.

{ "common_rules": "\\n1. Be written in English...", "example_prompts": { "Art Styles & Mediums": [ { "title": "🎌 Anime (Generic)", "content": "Create ONLY anime style... Each prompt must:{common_rules}..." } // ... more ] } }

Using Prompt Templates (Story Generator)

Select a template (e.g., "Anime", "Cinematic") to guide the LLM in generating image prompts that match a specific style for each story segment. Creates consistency. "AI Generated" uses a generic approach.

Advanced Controls (Steps, CFG, Seed)

Inference Steps: Number of refinement steps (e.g., 20-30). More steps = more detail, slower. Fewer = faster, less refined.

Guidance Scale (CFG): How strictly to follow the prompt (e.g., 7). Higher = stricter adherence, Lower = more creative freedom.

Seed: Controls randomness. Use `Fixed` for reproducibility, `Random` for variety, `Incrementing/Decrementing` for sequences.

Other Features

Supported Languages (TTS)

Kokoro (Local): English (US/UK), Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese, etc.

ElevenLabs (Cloud API): Wider range including Turkish, German, Russian, Korean, Polish, Arabic, etc.

YouTube metadata can also be translated via LLM APIs.

Background Music

Add `.mp3` or `.wav` background music in Story/Slide modes. Adjust volume relative to narration.

Output Formats

Videos: MP4 (H.264 video, AAC audio). Images: PNG.

Technical Details

Internet Requirements

Needed for: Setup, activation, license checks, updates, model downloads, cloud API use, YouTube uploads.

NOT Needed for: Core generation using already downloaded local models (Stable Diffusion, Kokoro, Ollama) once activated.

Data Privacy

Local Processing: Your inputs and outputs stay on your computer. Cloud APIs: Data sent to OpenAI/Google/ElevenLabs is subject to their policies. YouTube: Uploads go to Google/YouTube. Licensing: Minimal data (key, status) sent for validation. See our Privacy Policy.

Licensing & Support

Licensing & Updates

Active subscription needed for full use & updates. License typically for one computer. See License Agreement and Refund Policy.

Support & Contact

Check this documentation & FAQ first. Email support available for active subscribers at [email protected]. Priority support for higher tiers.

Frequently Asked Questions (FAQ)

General Questions & Features

Desktop app (Win/Linux) for unlimited local AI video/image generation using your hardware. Focuses on control & privacy. See What is StoryLLM?.

See System Requirements. Key: Win/Linux, NVIDIA GPU (8GB+ VRAM recommended, CUDA 11.8), Python 3.10, 16GB+ RAM, SSD.

Yes, for local models (Stable Diffusion, Kokoro, Ollama). Your hardware is the limit. Optional cloud APIs (OpenAI, Gemini, ElevenLabs) have their own external limits/costs. See Unlimited Generation.

Story: Unique image per audio segment, synced duration. Slide: Multiple images from one prompt, fixed duration per image. Image: Standalone still images.

Yes, in Story/Slide modes. You can upload an audio file and control its volume. See Background Music.

Narrated stories, explainers, tutorials, product slideshows, marketing content, social media videos, visual galleries, concept art, etc.

Connect your YouTube account via OAuth. Upload videos directly, optionally use LLM APIs (OpenAI/Gemini) to generate/translate metadata, set privacy, and schedule uploads. See YouTube Integration.

This documentation is the main guide. Video tutorials and more resources are planned for our website.

Key differences: Local processing (privacy/offline), true unlimited local generation, uses your hardware, model flexibility (local/cloud), subscription cost vs. per-item fees.

Active subscribers get email support ([email protected]). Higher tiers get priority support. See Support & Contact.

Cancel anytime via your account on our website. Cancellation stops future billing. Access continues until your paid period ends. See Refund Policy.

The software may enter limited mode or cease full function. An active subscription is needed for full features/updates. Your local files remain. See Licensing & Updates.

Currently, no free trial is offered. Review documentation, features, and requirements to evaluate. Due to the digital nature, refunds are limited (Refund Policy).

Technical Questions & Usage

Text: Local Ollama (Llama 3, Gemma etc) OR Cloud OpenAI/Gemini. Image: Local Stable Diffusion (SD1.5, SDXL, custom). Voice: Local Kokoro OR Cloud ElevenLabs. See Models & APIs.

Yes, use the Segment Editor to edit the prompt for a specific image, regenerate only that image, and then rebuild the video.

Local Kokoro: English, Japanese, Chinese, Spanish, French, etc. Cloud ElevenLabs API: Wider range including Turkish, German, Russian, etc. (requires key/plan). See Languages.

Depends on the AI model/service license you use. Check the license for specific Stable Diffusion models (especially custom ones) or the terms of service for OpenAI/Gemini/ElevenLabs if using their APIs. StoryLLM itself doesn't restrict usage beyond those underlying licenses.

Active subscription gives usage rights (typically 1 PC) & access to updates. Periodic online checks validate the license. See Licensing & Updates.

No. Needed for setup, activation, model downloads, updates, cloud APIs, YouTube uploads. Core local generation works offline after setup/downloads. See Internet Req.

Highly depends on your GPU (NVIDIA recommended), VRAM, resolution, model complexity. Seconds per image with good GPU, minutes per image on CPU. TTS/assembly is faster. See Optimizing.

Videos: MP4 (H.264/AAC). Images: PNG. See Output Formats.

Local Models: Data stays on your PC. Cloud APIs: Data sent to provider (OpenAI/Google/ElevenLabs) per their policy. YouTube: Data sent to Google/YouTube. Minimal license data sent to us. See Data Privacy.

Yes, `.ckpt` and `.safetensors` are supported. Place them in the designated model folder and update the model config file (e.g., `models.json`) to make them selectable. See Custom Models.

During active generation (especially images), yes, it uses significant GPU/CPU resources. Performance impact is minimal when idle. Avoid running other heavy tasks during generation. See Optimizing.

Typically one primary computer per license. Transfer may require deactivation or contacting support ([email protected]). See Licensing and your License Agreement.

Welcome to StoryLLM!

System Requirements

Installation (Automated Setup via Conda)

Before You Start

Install Miniconda (If you don't have Conda)

1. Download Your Project Files

2. Extract the Downloaded Archive

3. Run the Automated Setup Script

4. Configure API Keys (Especially Hugging Face Token)

5. Launch StoryLLM

6. Updating StoryLLM

What is StoryLLM?

Local vs. Cloud Processing

Local Processing (Default & Recommended)

Cloud Processing (Optional APIs)

Unlimited Local Generation

How to Use StoryLLM

Using the Story Generator

Using the Slide Generator

Using the Image Generator

Editing Your Creations (Segment Editor)

Configuration & API Integrations

Configuration File (`config.json`) Explained

1. `api_keys` Section

2. `api_links` Section

3. `llm_models` Section

Using API Integrations

OpenAI Integration

Google Gemini Integration

ElevenLabs TTS Integration

Ollama Integration (Local LLM)

YouTube Integration

AI Models & Performance

Model Configuration (`models.json`)

Understanding VRAM Requirements

Optimizing Performance

Using Custom Models (Stable Diffusion)

Prompting Guide

Prompt Configuration (`prompts.json`)

Using Prompt Templates (Story Generator)

Advanced Controls (Steps, CFG, Seed)

Other Features

Supported Languages (TTS)

Background Music

Output Formats

Technical Details

Internet Requirements

Data Privacy

Licensing & Support

Licensing & Updates

Support & Contact

Frequently Asked Questions (FAQ)

General Questions & Features

What exactly is StoryLLM?

What computer do I need?

Is generation truly unlimited?

Difference between Story/Slide/Image generators?

Can I add background music?

What kinds of videos can I create?

How does YouTube integration work?

Are there tutorials?

How is StoryLLM different from online tools?

What customer support is offered?

How do I cancel my subscription?

What happens if my subscription expires?

Is there a free trial?

Technical Questions & Usage

Which AI models can I use?

Can I edit images after generation?

What TTS languages are supported?

Can I use generated content commercially?

How do licensing and updates work?

Do I need constant internet?

How fast is video generation?

What output formats?

How is my data privacy protected?

Can I use my own Stable Diffusion models?

Will StoryLLM slow down my PC?

Can I transfer my license?