boltQuickstart

CrystalLipSync

Real-time, audio-driven lip sync and eye blink for Unity. CrystalLipSync analyzes an AudioSource every frame using FFT spectral analysis and maps the result to 15 viseme blendshapes on any SkinnedMeshRenderer. It works standalone or as a fully integrated Game Creator 2 solution.


Quick Start

  1. Open Tools > CrystalLipSync > Setup Wizard (recommended), or add components manually:

    • Add an AudioSource to your character root.

    • Add a CrystalLipSyncController to the same GameObject and assign the AudioSource.

    • Add a CrystalLipSyncBlendshapeTarget to the SkinnedMeshRenderer that has your viseme blendshapes.

    • Click Auto-Map in the BlendshapeTarget inspector to auto-assign blendshapes.

  2. Play an AudioClip on the assigned AudioSource. The mouth will animate automatically.

No voice-over audio? Enable the Add Text Lip Sync option in the Setup Wizard, or add the CrystalTextLipSync component manually. Call PlayText("Hello world") from script or use the GC2 Dialogue integration for automatic text lip sync.


Setup Wizard

The Setup Wizard provides one-click character provisioning.

Open it via: Tools > CrystalLipSync > Setup Wizard

What it does

When you drag a character GameObject into the wizard and press Setup Character, it:

  1. AudioSource ... Adds one to the character root if none exists (3D spatial, playOnAwake = false).

  2. CrystalLipSyncController ... Adds one and wires it to the AudioSource.

  3. CrystalLipSyncBlendshapeTarget ... Scans every child SkinnedMeshRenderer, scores each for viseme blendshapes, picks the best one, adds the target component, and auto-maps all detected visemes.

  4. CrystalEyeBlink (optional, enabled by default) ... Adds the eye blink component and auto-detects blink blendshapes.

  5. CrystalTextLipSync (optional) ... Adds the text-driven lip sync component and wires it to the controller. Enable this when your dialogue has no voice-over audio.

  6. CrystalMicrophoneLipSync (optional) ... Adds the microphone lip sync component and wires it to the controller. Enable this for VR, voice chat, or live microphone scenarios.

The entire operation is a single Undo action ... press Ctrl+Z to revert everything at once.

Result summary

After setup, the wizard displays:

  • How many components were added.

  • How many viseme blendshapes were auto-mapped.

  • Per-component status (added vs. already exists).

Tip: The wizard skips components that already exist, so it's safe to run again if you add a mesh later.


Core Components

CrystalLipSyncController

Add Component Menu: CrystalLipSync / Lip Sync Controller

The central analysis engine. It reads spectrum data from an AudioSource every frame and produces an array of 15 smoothed viseme weights.

Inspector Field
Description
Default

Audio Source

The AudioSource to analyze. Can live on any GameObject.

...

FFT Size

FFT window size (256 ... 4096). Higher = better frequency resolution, slower response.

1024

Volume Threshold

RMS below this value → mouth stays closed.

0.005

Sensitivity

Volume-to-viseme gain. Increase for quiet audio.

5

Smoothing Attack

How fast visemes open (higher = snappier).

30

Smoothing Release

How fast visemes close (higher = quicker release).

15

Profile

Optional CrystalLipSyncProfile ScriptableObject to override settings.

...

Mood

Current emotional mood (Neutral, Happy, Angry, Sad).

Neutral

Show Debug Logs

Log dominant viseme, volume, and centroid every frame.

false

CrystalLipSyncBlendshapeTarget

Add Component Menu: CrystalLipSync / Blendshape Target Requires: SkinnedMeshRenderer on the same GameObject.

Reads VisemeWeights from a Controller and drives blendshapes in LateUpdate (runs after the Animator, so it wins over animation clips that write to the same blendshapes).

Multiple targets can reference the same controller ... useful for characters with separate face, beard, or tongue meshes.

Inspector Field
Description
Default

Controller

The CrystalLipSyncController providing weights.

...

Use Mood Mappings

When enabled, exposes per-mood mapping sets.

false

Neutral / Happy / Angry / Sad Mappings

Per-viseme blendshape index + weight multiplier.

All -1 (unmapped)

Global Weight

Master multiplier for all blendshapes (0 ... 200%).

100

Max Blendshape Value

Clamped ceiling per blendshape.

100

When the component is disabled, all mapped blendshapes are reset to zero so the character doesn't freeze in a mouth pose.

CrystalLipSyncProfile

Create via: Assets > Create > CrystalLipSync > Lip Sync Profile

A ScriptableObject that stores analysis settings and per-viseme multipliers. Assign it to a Controller's Profile field to share settings between characters of similar voice type.

Field
Description

Volume Threshold

Overrides the controller's threshold.

Sensitivity

Overrides the controller's sensitivity.

Smoothing Attack / Release

Overrides the controller's smoothing.

Per-Viseme Multipliers

Array of 15 floats. Values > 1 amplify, < 1 suppress individual visemes.

When a profile is assigned, the controller uses only the profile's settings (ignoring its own local values).

Add Component Menu: CrystalLipSync / Eye Blink

A standalone MonoBehaviour for natural, randomized eye blinking. Works independently of Game Creator 2.

Auto-detects eyeBlink_L, Fcl_EYE_Close, Blink, and similar blendshape naming patterns across VRM, ARKit, and custom rigs.

Inspector Field
Description
Default

Blink Interval

Average seconds between blinks.

4

Interval Randomness

± seconds of variation.

1.5

Close Speed

Seconds to fully close the eyelid.

0.08

Open Speed

Seconds to fully open.

0.12

Closed Hold Time

Seconds the eye stays fully shut.

0.05

Double Blink

Chance (0...1) of a second rapid blink.

0.15

Half Blink

Chance (0...1) of a partial blink (50...80%).

0.1

Max Weight

Maximum blendshape value when closed.

100

Target Mesh

Leave empty to auto-detect, or assign manually.

Auto

Blink Left / Right / Both

Blendshape indices. The inspector shows dropdown selectors when a mesh is assigned.

Auto-detect

Both vs. Left/Right priority: If a combined "both" blendshape is mapped, CrystalEyeBlink drives only that blendshape. Left and Right are used as a fallback only when "both" is unmapped (-1). This prevents double-driving when all three are detected.


CrystalTextLipSync

Add Component Menu: CrystalLipSync / Text Lip Sync

Drives lip sync blendshapes from text instead of audio. Converts a string into a timed viseme sequence and plays it back with smooth blending between mouth shapes.

Designed for dialogue systems where no voice-over audio is available.

Inspector Field
Description
Default

Blendshape Target

The CrystalLipSyncBlendshapeTarget to drive. Auto-detected if empty.

Auto

Controller

Optional. If assigned, writes to the controller's VisemeWeights array so all connected targets pick them up.

Auto

Smoothing Attack

How quickly visemes blend in (higher = snappier).

25

Smoothing Release

How quickly visemes blend out (higher = faster release).

18

Intensity

Global weight multiplier for text-driven visemes (0...1).

0.85

Audio takes priority: When the controller's AudioSource is actively playing, text-driven weights are ignored. This means you can have both audio and text lip sync on the same character ... audio lip sync automatically wins when voice-over is present.


CrystalMicrophoneLipSync

Add Component Menu: CrystalLipSync / Microphone Lip Sync

Captures real-time microphone audio and feeds it into a CrystalLipSyncController for live lip sync. The component opens the microphone, creates a looping AudioClip, assigns it to the controller's AudioSource, and keeps the playback position synced with the microphone write head.

Designed for VR avatars, social VR, voice chat, live presentations, and any scenario where the player speaks into a microphone.

Inspector Field
Description
Default

Controller

The CrystalLipSyncController to feed mic audio into. Auto-detected if empty.

Auto

Microphone Device

Name of the microphone to use. Leave empty for the system default.

(default)

Sample Rate

Recording sample rate in Hz. 44100 is standard; 22050 saves memory.

44100

Buffer Length (sec)

Duration of the internal looping audio buffer. 1...2 seconds is sufficient.

1

Mute Playback

Mutes the AudioSource output so the user doesn't hear their own voice through speakers. The FFT analysis still works because GetSpectrumData reads raw source data.

true

Auto Start

Start capturing automatically when the component is enabled.

true

How it works: The microphone audio is piped into the same AudioSource that the CrystalLipSyncController analyzes. The existing FFT spectral analysis handles everything ... no additional analysis engine or ML model is needed. The same 6-band frequency analysis, spectral centroid, and high-freq ratio computation that drives audio lip sync also drives microphone lip sync.

AudioSource state preservation: When capture starts, the component saves the AudioSource's current clip, volume, loop, and mute settings. When capture stops, these are restored so pre-recorded audio playback continues to work normally.

Latency management: The component monitors the drift between the microphone write position and AudioSource playback position each frame. If drift exceeds half the buffer, it resyncs to 50ms behind the mic head to prevent audible pops while keeping latency minimal.


The 15-Viseme System

CrystalLipSync uses the standard 15-viseme set common in speech animation:

Index
Code
Phonemes
Description

0

SIL

...

Silence / rest position

1

PP

p, b, m

Bilabial plosive

2

FF

f, v

Labiodental fricative

3

TH

th

Dental fricative

4

DD

d, t, n

Alveolar

5

KK

k, g

Velar

6

CH

ch, j, sh

Postalveolar

7

SS

s, z

Alveolar fricative

8

NN

n, ng

Nasal

9

RR

r

Alveolar approximant

10

AA

a, ah

Open vowel

11

E

e, eh

Mid front vowel

12

I

i, ee

Close front vowel

13

O

o, oh

Mid back vowel

14

U

u, oo

Close back vowel

The analyzer computes 6 frequency band energies (80 Hz ... 12 kHz), spectral centroid, and an RMS volume level each frame, then uses these features to estimate the weight for every viseme simultaneously.


Auto-Mapping

The auto-mapper (CrystalLipSyncAutoMapper) uses a multi-tier scoring system to match blendshape names to visemes:

  1. Tokenization ... Splits names by _, ., -, spaces, and camelCase boundaries. Expands abbreviations (mthmouth, fclfacial, etc.).

  2. Scoring ... Each viseme has a set of match rules with canonical codes, aliases, and keywords. Tokens are scored against these rules.

  3. Negative patterns ... Tokens like smile for SIL, or eye for viseme E, are penalized to avoid false matches.

  4. Greedy bipartite assignment ... Candidates are sorted by score (highest first). Each blendshape and viseme is assigned at most once, preventing conflicts.

Supported naming conventions include:

  • VRChat: vrc.v_sil, vrc.v_aa, vrc.v_oh, etc.

  • ARKit: mouthOpen, mouthPucker, mouthSmile, etc.

  • VRM / UniVRM: Fcl_MTH_A, Fcl_MTH_O, Fcl_MTH_Close, etc.

  • Custom: mouth_AA, viseme_PP, lipsync_e, etc.

The auto-mapper also provides FindBestVisemeMesh(), which scores every SkinnedMeshRenderer in a hierarchy and returns the one most likely to contain viseme blendshapes.


Mood System

CrystalLipSync supports 4 emotional moods:

Mood
Description

Neutral

Default mapping. Always used when mood mappings are disabled.

Happy

Wider mouth shapes, smile-inflected visemes.

Angry

Tighter, more compressed mouth shapes.

Sad

Droopier, more subtle mouth movements.

How it works

  1. Enable Use Mood Mappings on the CrystalLipSyncBlendshapeTarget.

  2. Configure separate blendshape mappings for each mood tab.

  3. Change the mood on the Controller via controller.SetMood(LipSyncMood.Happy) or through the GC2 Set Lip Sync Mood instruction.

The target automatically reads the controller's current mood each frame and uses the corresponding mapping set.


Text-Driven Lip Sync

For characters without voice-over audio, CrystalLipSync can animate the mouth from dialogue text alone. The system converts each character and common digraphs (th, sh, ch, ee, oo, etc.) into the corresponding viseme and plays the sequence in sync with your typewriter speed.

How It Works

  1. CrystalTextToViseme converts input text into a list of VisemeEntry structs, each containing a viseme type and a duration.

  2. CrystalTextLipSync plays the sequence over time, smoothly blending between mouth shapes using configurable attack/release smoothing.

  3. The text lip sync writes into the CrystalLipSyncController.VisemeWeights array ... the same array used by audio analysis ... so all existing CrystalLipSyncBlendshapeTarget components pick up the weights automatically.

Priority rule: Audio lip sync always takes priority. When the controller's AudioSource is actively playing, text-driven weights are not applied. This allows you to have both systems on the same character ... text lip sync for silent dialogue, audio lip sync when voice-over is available.

Standalone Usage


Limitations & Gotchas

Audio must play on the controller's AudioSource

CrystalLipSync uses AudioSource.GetSpectrumData() to read FFT data. Unity only returns spectrum data from the exact AudioSource instance that is playing. If audio plays on a different source (e.g. GC2's pooled AudioManager sources, or a third-party dialogue system's own source), the lip sync analyzer will see silence.

Solution: Always play speech audio on the AudioSource referenced by the CrystalLipSyncController. Use the Play Lip Sync Speech GC2 instruction, or call source.Play() directly on the controller's source.

Spatial Blend affects audibility (not analysis)

The default setup uses spatialBlend = 1 (fully 3D). This means the audio volume depends on the distance between the AudioSource and the AudioListener. If the character is far from the camera, the speech may sound quiet. This does not affect lip sync analysis (FFT reads raw source data regardless of spatial blend), but it does affect what the player hears.

If you want speech to always be audible at full volume, set spatialBlend = 0 (2D) on the AudioSource.

Game Creator 2: "Player" target requires IsPlayer

When using GC2 instructions with the Player target property, the character must have IsPlayer = true enabled in the Character component. Without it, ShortcutPlayer.Instance returns null and the instruction silently does nothing.

Blendshape naming matters for auto-mapping

The auto-mapper relies on blendshape name patterns to detect visemes. If your model uses unconventional naming (e.g. shape_001, custom_23), auto-mapping will fail and you'll need to assign blendshapes manually in the inspector.

Common supported patterns: vrc.v_*, Fcl_MTH_*, mouth*, viseme_*, ARKit names (mouthOpen, mouthPucker, mouthSmile, etc.).

If a combined blink blendshape (e.g. Blink, Fcl_EYE_Close) is detected alongside separate left/right blendshapes, the combined one is used exclusively. Left and Right are only driven when the combined blendshape is unmapped. This prevents double-driving that would produce exaggerated blink weights.

One Controller per AudioSource

Each CrystalLipSyncController should reference a unique AudioSource. Sharing a single AudioSource between multiple controllers will work but is redundant ... they'll all produce identical results. Conversely, a single controller can drive multiple CrystalLipSyncBlendshapeTarget components on different meshes.

LateUpdate blendshape override

CrystalLipSyncBlendshapeTarget writes blendshape values in LateUpdate, which runs after the Animator. This means lip sync values override animation clip data for the same blendshapes. If you have animation clips that drive mouth blendshapes, the lip sync will take priority when the target component is enabled.

Text lip sync is approximate

The text-to-viseme engine maps individual letters and common digraphs to visemes. English spelling is notoriously irregular, so the mapping is approximate ... it produces convincing mouth movement for most text but is not a phonetic parser. Results may vary for non-English text or unusual spellings.

Text vs. Audio lip sync priority

When both CrystalTextLipSync and audio analysis are active on the same character, audio always wins. If the controller's AudioSource.isPlaying is true, text-driven weights are not written to the controller's VisemeWeights array. Text lip sync only runs when the AudioSource is idle.

Microphone lip sync takes over the AudioSource

While CrystalMicrophoneLipSync is capturing, it replaces the AudioSource's clip with a looping microphone buffer. Pre-recorded audio cannot play on the same source simultaneously. When capture stops, the previous AudioSource state (clip, volume, loop, mute) is restored automatically.

If you need both microphone and pre-recorded lip sync on the same character, stop the mic capture before playing speech audio, then restart it after.

Microphone requires user permission on some platforms

On iOS, and Android, the browser or OS may require explicit user permission to access the microphone. Unity's Microphone.Start() will return null if permission is denied. The component logs a warning in this case. On WebGL Microphone to Lip Sync is currently not working.

GC2 Dialogue text lip sync requires role assignments

The CrystalDialogueLipSync component resolves the speaking Actor to a scene GameObject via the Dialogue's Roles assignments. If an Actor is not assigned a target GameObject in the Dialogue, the component cannot find the speaker and will silently skip that line.


Troubleshooting

Symptom
Likely Cause
Fix

Mouth doesn't move

Audio is playing on a different AudioSource than the one assigned to the controller.

Assign the correct AudioSource to the controller, or use the Play Lip Sync Speech instruction.

Mouth moves but very subtly

Sensitivity too low, or audio is very quiet.

Increase Sensitivity on the controller, or increase audio volume.

Mouth stays open

Volume Threshold is too low, picking up ambient noise.

Increase Volume Threshold.

Auto-Map found 0 visemes

Blendshape names don't match any known pattern.

Map blendshapes manually in the target inspector.

GC2 instruction does nothing

Target is set to "Player" but IsPlayer isn't enabled on the Character.

Enable IsPlayer on the Character component.

Eye blink weights are doubled

Both a combined blink and L/R blendshapes are being driven.

This should be handled automatically. If not, unmap the combined blendshape index (-1) and keep only L/R, or vice versa.

No audio heard, but mouth moves

spatialBlend = 1 and the AudioListener is far from the character.

Set spatialBlend = 0 for 2D audio, or move the camera closer.

Text lip sync doesn't animate

CrystalTextLipSync not found on the speaker, or PlayText wasn't called.

Ensure the component is on the character. For GC2, set the Actor's Gibberish to Crystal Text Lip Sync, or add CrystalDialogueLipSync to the scene. Check that the Actor has a role assignment in the Dialogue.

Text lip sync ignored during voice-over

Expected behavior ... audio lip sync takes priority.

If you want text lip sync during audio playback, disable "Skip When Audio Present" on CrystalDialogueLipSync.

Text lip sync ends before typewriter

Duration normalization is disabled, or an older version without it.

Ensure matchTypewriterDuration is true (default) in CrystalTextToViseme.Generate(). Update to the latest version.

Microphone lip sync: mouth doesn't move

Microphone not recording, or wrong device selected.

Check Microphone.devices has entries. Try leaving Device Name empty for the default mic. Ensure CrystalMicrophoneLipSync.IsCapturing is true.

Microphone lip sync: hearing own voice

Mute Playback is disabled.

Enable Mute Playback on the CrystalMicrophoneLipSync component. The FFT analysis works on muted sources.

Microphone lip sync: choppy animation

Buffer too short, causing frequent resyncs.

Increase Buffer Length to 2...3 seconds.


Last updated