browserWebGL Lip Sync Guide

CrystalLipSync fully supports WebGL builds ... both real-time audio analysis and pre-baked playback. No plugins, no JavaScript bridges.


Overview

Unity's AudioSource.GetSpectrumData() is unavailable on WebGL. CrystalLipSync provides two alternatives:

Approach
How It Works
CPU Cost
Setup

Real-time

OnAudioFilterRead + managed C# FFT

~0.1 ms/frame per character

None ... automatic on WebGL

Baked

Offline analysis → ScriptableObject → runtime playback

< 0.01 ms/frame per character

Bake in editor, assign asset

Both approaches work out of the box. You can even mix them ... baked for known voice-over, real-time as a fallback for dynamic audio.


Real-Time Lip Sync on WebGL

How It Works

When running on WebGL, CrystalLipSyncController automatically:

  1. Detects the WebGL platform at startup

  2. Attaches a CrystalLipSyncAudioCapture component to the AudioSource's GameObject

  3. Routes analysis through managed FFT instead of GetSpectrumData

No setup required. Your existing scene configuration works without changes.

Architecture

Data flow:

  1. Audio Thread: OnAudioFilterRead fires ~every 20ms with raw PCM samples. The capture component downmixes to mono and writes to a lock-protected ring buffer. Audio passes through unmodified.

  2. Main Thread: The controller calls FillBuffers() which copies samples, applies a Blackman-Harris window, runs a 2N-point Cooley-Tukey FFT, and extracts magnitude bins.

  3. Analysis: The spectrum feeds into the same pipeline used on desktop ... identical band energies, spectral centroid, and viseme classification.

Testing in the Editor

Enable Force Audio Capture on the CrystalLipSyncController inspector to use the WebGL path on desktop ... useful for verifying behavior without building.

Why Not a JavaScript Bridge (.jslib)?

Some plugins use a .jslib to tap into the browser's Web Audio API. CrystalLipSync avoids this because:

  • Fragile AudioSource binding ... Unity doesn't expose which Web Audio nodes correspond to which AudioSource

  • Browser inconsistencies ... AnalyserNode returns dB-scaled magnitudes with different windowing than Unity

  • Maintenance burden ... raw JavaScript with no type safety or C# debugger support

  • OnAudioFilterRead already works ... the data is already available on the managed side

Performance

Metric
Value

FFT Size (default)

1024 spectrum bins → 2048-point FFT

Operations per frame

~22,500 multiply-adds

Ring buffer memory

~32 KB

Per-frame allocations

Zero

Overhead

~0.1ms per frame


Baked Lip Sync on WebGL

Why Bake?

Baking is an optional optimization ... real-time works out of the box. Baking is best for:

  • Even lower CPU cost ... ~10× cheaper than real-time FFT

  • Deterministic results ... identical lip sync on every device, every run

  • Scalability ... dozens of characters speaking with negligible overhead

  • Pre-recorded voice-over ... if audio is known at build time, why analyze it every frame?

If your audio is dynamic (microphone, procedural, user-uploaded), stick with real-time.

Step 1 - Prepare Audio Clips

Set each AudioClip's Load Type to Decompress On Load or Compressed In Memory:

⚠️ Streaming clips cannot be baked ... the editor needs the full waveform.

Step 2 - Bake

Open Tools → Crystal LipSync → Bake Lip Sync.

Single clip: Drag an AudioClip → click Bake Single Clip → choose save location.

Batch: Expand Batch Bake → add clips → click Bake All (Batch) → choose destination folder.

Match your bake settings to your controller (FFT Size, Sensitivity, Threshold, Smoothing) or assign the same Profile to both.

Step 3 - Scene Setup

Add these to your character:

#
Component
Purpose

1

CrystalLipSyncController

Holds VisemeWeights[]

2

CrystalBakedLipSync

Reads baked data, writes to controller

3

CrystalLipSyncBlendshapeTarget or JawBoneTarget

Drives the mesh/bone

4

AudioSource

Plays the voice clip

Baked-Only Setup (Recommended for WebGL)

Hybrid Setup (Baked + Real-Time Fallback)

When baked data is available it takes priority (via [DefaultExecutionOrder(100)]). When no baked data exists, real-time FFT fills in.

Step 4 - Play

Swapping clips dynamically:

Dialogue System Integration

Both GC2 and PixelCrushers integrations support baked lip sync automatically via a Baked Clip Lookup table (maps AudioClip → baked data). See the Baked Lip Sync guide for setup details.

Organizing Baked Assets

Re-Baking

Save to the same path to update in-place ... all scene/prefab references remain intact.


WebGL Build Checklist

  1. File → Build Settings → WebGL - build as normal

  2. No special settings required for lip sync

  3. Baked assets .asset are included automatically (referenced by components)

Browser Audio Requirement

Browsers require user interaction before playing audio. Ensure your game has a "Start" or "Click to Play" screen before any AudioSource playback. This is a browser requirement, not a CrystalLipSync limitation.


Compatibility

Feature
WebGL Support

Audio lip sync (real-time)

Baked lip sync

Text lip sync

✅ (no FFT needed)

Microphone lip sync

❌ (browser security)

Blendshape targets

Jaw bone targets

Profiles & moods

All FFT sizes (256...4096)

GC2 Dialogue integration

PixelCrushers Dialogue integration


Troubleshooting

Mouth doesn't move

Check
Fix

AudioSource not assigned

Assign it to the controller (real-time) or baked component

Audio blocked by browser

Add a user interaction screen before playback

Baked Clip Data missing

Assign the baked .asset to the baked component

Auto Play disabled

Enable the toggle, or call Play() manually

Force Audio Capture not checked

Enable it in the Editor to test the WebGL path

Timing feels off (baked)

  • Adjust Time Offset on CrystalBakedLipSync (try -0.05 to 0.05)

  • WebGL audio scheduling can introduce small latency ... a negative offset compensates

Lip sync quality differs between Editor and WebGL

The managed FFT produces slightly different magnitudes than Unity's native FFT. Viseme classification uses relative band energy ratios, so results should be nearly identical. If you notice differences:

  • Adjust Sensitivity slightly (±1...2)

  • Tweak Volume Threshold since WebGL audio levels can differ

Bake fails or produces silent data

  • AudioClip Load Type must not be Streaming

  • Verify the clip contains audio (check waveform in Inspector)

  • Very quiet clips may fall below the Volume Threshold ... lower it and re-bake


FAQ

Q: Do I need to add CrystalLipSyncAudioCapture manually? A: No. The controller creates and manages it automatically on WebGL.

Q: Does this work with Addressables / Asset Bundles? A: Yes. As long as the AudioClip plays through an AudioSource, both real-time and baked paths work.

Q: Should I use real-time or baked for WebGL? A: Both work. Real-time requires zero setup. Baking is ~10× cheaper in CPU and gives deterministic results. For voice-over heavy games, baking is recommended.

Q: Can I use both on the same character? A: Yes. Baked takes priority when active; real-time fills gaps when no baked data exists.

Q: What about mobile WebGL? A: Works on mobile browsers that support Web Audio API (all modern mobile browsers). The same user-interaction requirement applies.

Last updated