Quickstart
CrystalLipSync
Real-time, audio-driven lip sync and eye blink for Unity.
CrystalLipSync analyzes an AudioSource every frame using FFT spectral analysis and maps the result to 15 viseme blendshapes on any SkinnedMeshRenderer. It works standalone or as a fully integrated Game Creator 2 solution.
Quick Start
Open Tools > CrystalLipSync > Setup Wizard (recommended), or add components manually:
Add an
AudioSourceto your character root.Add a
CrystalLipSyncControllerto the same GameObject and assign the AudioSource.Add a
CrystalLipSyncBlendshapeTargetto theSkinnedMeshRendererthat has your viseme blendshapes.Click Auto-Map in the BlendshapeTarget inspector to auto-assign blendshapes.
Play an
AudioClipon the assignedAudioSource. The mouth will animate automatically.
No voice-over audio? Enable the Add Text Lip Sync option in the Setup Wizard, or add the CrystalTextLipSync component manually. Call PlayText("Hello world") from script or use the GC2 Dialogue integration for automatic text lip sync.
Setup Wizard

The Setup Wizard provides one-click character provisioning.
Open it via: Tools > CrystalLipSync > Setup Wizard
What it does
When you drag a character GameObject into the wizard and press Setup Character, it:
AudioSource ... Adds one to the character root if none exists (3D spatial,
playOnAwake = false).CrystalLipSyncController ... Adds one and wires it to the AudioSource.
CrystalLipSyncBlendshapeTarget ... Scans every child
SkinnedMeshRenderer, scores each for viseme blendshapes, picks the best one, adds the target component, and auto-maps all detected visemes.CrystalEyeBlink (optional, enabled by default) ... Adds the eye blink component and auto-detects blink blendshapes.
CrystalTextLipSync (optional) ... Adds the text-driven lip sync component and wires it to the controller. Enable this when your dialogue has no voice-over audio.
CrystalMicrophoneLipSync (optional) ... Adds the microphone lip sync component and wires it to the controller. Enable this for VR, voice chat, or live microphone scenarios.
The entire operation is a single Undo action ... press Ctrl+Z to revert everything at once.
Result summary
After setup, the wizard displays:
How many components were added.
How many viseme blendshapes were auto-mapped.
Per-component status (added vs. already exists).
Tip: The wizard skips components that already exist, so it's safe to run again if you add a mesh later.
Core Components
CrystalLipSyncController

Add Component Menu: CrystalLipSync / Lip Sync Controller
The central analysis engine. It reads spectrum data from an AudioSource every frame and produces an array of 15 smoothed viseme weights.
Audio Source
The AudioSource to analyze. Can live on any GameObject.
...
FFT Size
FFT window size (256 ... 4096). Higher = better frequency resolution, slower response.
1024
Volume Threshold
RMS below this value → mouth stays closed.
0.005
Sensitivity
Volume-to-viseme gain. Increase for quiet audio.
5
Smoothing Attack
How fast visemes open (higher = snappier).
30
Smoothing Release
How fast visemes close (higher = quicker release).
15
Profile
Optional CrystalLipSyncProfile ScriptableObject to override settings.
...
Mood
Current emotional mood (Neutral, Happy, Angry, Sad).
Neutral
Show Debug Logs
Log dominant viseme, volume, and centroid every frame.
false
CrystalLipSyncBlendshapeTarget

Add Component Menu: CrystalLipSync / Blendshape Target
Requires: SkinnedMeshRenderer on the same GameObject.
Reads VisemeWeights from a Controller and drives blendshapes in LateUpdate (runs after the Animator, so it wins over animation clips that write to the same blendshapes).
Multiple targets can reference the same controller ... useful for characters with separate face, beard, or tongue meshes.
Controller
The CrystalLipSyncController providing weights.
...
Use Mood Mappings
When enabled, exposes per-mood mapping sets.
false
Neutral / Happy / Angry / Sad Mappings
Per-viseme blendshape index + weight multiplier.
All -1 (unmapped)
Global Weight
Master multiplier for all blendshapes (0 ... 200%).
100
Max Blendshape Value
Clamped ceiling per blendshape.
100
When the component is disabled, all mapped blendshapes are reset to zero so the character doesn't freeze in a mouth pose.
CrystalLipSyncProfile

Create via: Assets > Create > CrystalLipSync > Lip Sync Profile
A ScriptableObject that stores analysis settings and per-viseme multipliers. Assign it to a Controller's Profile field to share settings between characters of similar voice type.
Volume Threshold
Overrides the controller's threshold.
Sensitivity
Overrides the controller's sensitivity.
Smoothing Attack / Release
Overrides the controller's smoothing.
Per-Viseme Multipliers
Array of 15 floats. Values > 1 amplify, < 1 suppress individual visemes.
When a profile is assigned, the controller uses only the profile's settings (ignoring its own local values).
CrystalEyeBlink

Add Component Menu: CrystalLipSync / Eye Blink
A standalone MonoBehaviour for natural, randomized eye blinking. Works independently of Game Creator 2.
Auto-detects eyeBlink_L, Fcl_EYE_Close, Blink, and similar blendshape naming patterns across VRM, ARKit, and custom rigs.
Blink Interval
Average seconds between blinks.
4
Interval Randomness
± seconds of variation.
1.5
Close Speed
Seconds to fully close the eyelid.
0.08
Open Speed
Seconds to fully open.
0.12
Closed Hold Time
Seconds the eye stays fully shut.
0.05
Double Blink
Chance (0...1) of a second rapid blink.
0.15
Half Blink
Chance (0...1) of a partial blink (50...80%).
0.1
Max Weight
Maximum blendshape value when closed.
100
Target Mesh
Leave empty to auto-detect, or assign manually.
Auto
Blink Left / Right / Both
Blendshape indices. The inspector shows dropdown selectors when a mesh is assigned.
Auto-detect
Both vs. Left/Right priority: If a combined "both" blendshape is mapped, CrystalEyeBlink drives only that blendshape. Left and Right are used as a fallback only when "both" is unmapped (-1). This prevents double-driving when all three are detected.
CrystalTextLipSync

Add Component Menu: CrystalLipSync / Text Lip Sync
Drives lip sync blendshapes from text instead of audio. Converts a string into a timed viseme sequence and plays it back with smooth blending between mouth shapes.
Designed for dialogue systems where no voice-over audio is available.
Blendshape Target
The CrystalLipSyncBlendshapeTarget to drive. Auto-detected if empty.
Auto
Controller
Optional. If assigned, writes to the controller's VisemeWeights array so all connected targets pick them up.
Auto
Smoothing Attack
How quickly visemes blend in (higher = snappier).
25
Smoothing Release
How quickly visemes blend out (higher = faster release).
18
Intensity
Global weight multiplier for text-driven visemes (0...1).
0.85
Audio takes priority: When the controller's AudioSource is actively playing, text-driven weights are ignored. This means you can have both audio and text lip sync on the same character ... audio lip sync automatically wins when voice-over is present.
CrystalMicrophoneLipSync
Add Component Menu: CrystalLipSync / Microphone Lip Sync
Captures real-time microphone audio and feeds it into a CrystalLipSyncController for live lip sync. The component opens the microphone, creates a looping AudioClip, assigns it to the controller's AudioSource, and keeps the playback position synced with the microphone write head.
Designed for VR avatars, social VR, voice chat, live presentations, and any scenario where the player speaks into a microphone.
Controller
The CrystalLipSyncController to feed mic audio into. Auto-detected if empty.
Auto
Microphone Device
Name of the microphone to use. Leave empty for the system default.
(default)
Sample Rate
Recording sample rate in Hz. 44100 is standard; 22050 saves memory.
44100
Buffer Length (sec)
Duration of the internal looping audio buffer. 1...2 seconds is sufficient.
1
Mute Playback
Mutes the AudioSource output so the user doesn't hear their own voice through speakers. The FFT analysis still works because GetSpectrumData reads raw source data.
true
Auto Start
Start capturing automatically when the component is enabled.
true
How it works: The microphone audio is piped into the same AudioSource that the CrystalLipSyncController analyzes. The existing FFT spectral analysis handles everything ... no additional analysis engine or ML model is needed. The same 6-band frequency analysis, spectral centroid, and high-freq ratio computation that drives audio lip sync also drives microphone lip sync.
AudioSource state preservation: When capture starts, the component saves the AudioSource's current clip, volume, loop, and mute settings. When capture stops, these are restored so pre-recorded audio playback continues to work normally.
Latency management: The component monitors the drift between the microphone write position and AudioSource playback position each frame. If drift exceeds half the buffer, it resyncs to 50ms behind the mic head to prevent audible pops while keeping latency minimal.
The 15-Viseme System
CrystalLipSync uses the standard 15-viseme set common in speech animation:
0
SIL
...
Silence / rest position
1
PP
p, b, m
Bilabial plosive
2
FF
f, v
Labiodental fricative
3
TH
th
Dental fricative
4
DD
d, t, n
Alveolar
5
KK
k, g
Velar
6
CH
ch, j, sh
Postalveolar
7
SS
s, z
Alveolar fricative
8
NN
n, ng
Nasal
9
RR
r
Alveolar approximant
10
AA
a, ah
Open vowel
11
E
e, eh
Mid front vowel
12
I
i, ee
Close front vowel
13
O
o, oh
Mid back vowel
14
U
u, oo
Close back vowel
The analyzer computes 6 frequency band energies (80 Hz ... 12 kHz), spectral centroid, and an RMS volume level each frame, then uses these features to estimate the weight for every viseme simultaneously.
Auto-Mapping
The auto-mapper (CrystalLipSyncAutoMapper) uses a multi-tier scoring system to match blendshape names to visemes:
Tokenization ... Splits names by
_,.,-, spaces, and camelCase boundaries. Expands abbreviations (mth→mouth,fcl→facial, etc.).Scoring ... Each viseme has a set of match rules with canonical codes, aliases, and keywords. Tokens are scored against these rules.
Negative patterns ... Tokens like
smilefor SIL, oreyefor viseme E, are penalized to avoid false matches.Greedy bipartite assignment ... Candidates are sorted by score (highest first). Each blendshape and viseme is assigned at most once, preventing conflicts.
Supported naming conventions include:
VRChat:
vrc.v_sil,vrc.v_aa,vrc.v_oh, etc.ARKit:
mouthOpen,mouthPucker,mouthSmile, etc.VRM / UniVRM:
Fcl_MTH_A,Fcl_MTH_O,Fcl_MTH_Close, etc.Custom:
mouth_AA,viseme_PP,lipsync_e, etc.
The auto-mapper also provides FindBestVisemeMesh(), which scores every SkinnedMeshRenderer in a hierarchy and returns the one most likely to contain viseme blendshapes.
Mood System
CrystalLipSync supports 4 emotional moods:
Neutral
Default mapping. Always used when mood mappings are disabled.
Happy
Wider mouth shapes, smile-inflected visemes.
Angry
Tighter, more compressed mouth shapes.
Sad
Droopier, more subtle mouth movements.
How it works
Enable Use Mood Mappings on the
CrystalLipSyncBlendshapeTarget.Configure separate blendshape mappings for each mood tab.
Change the mood on the Controller via
controller.SetMood(LipSyncMood.Happy)or through the GC2 Set Lip Sync Mood instruction.
The target automatically reads the controller's current mood each frame and uses the corresponding mapping set.
Text-Driven Lip Sync
For characters without voice-over audio, CrystalLipSync can animate the mouth from dialogue text alone. The system converts each character and common digraphs (th, sh, ch, ee, oo, etc.) into the corresponding viseme and plays the sequence in sync with your typewriter speed.
How It Works
CrystalTextToVisemeconverts input text into a list ofVisemeEntrystructs, each containing a viseme type and a duration.CrystalTextLipSyncplays the sequence over time, smoothly blending between mouth shapes using configurable attack/release smoothing.The text lip sync writes into the
CrystalLipSyncController.VisemeWeightsarray ... the same array used by audio analysis ... so all existingCrystalLipSyncBlendshapeTargetcomponents pick up the weights automatically.
Priority rule: Audio lip sync always takes priority. When the controller's AudioSource is actively playing, text-driven weights are not applied. This allows you to have both systems on the same character ... text lip sync for silent dialogue, audio lip sync when voice-over is available.
Standalone Usage
Limitations & Gotchas
Audio must play on the controller's AudioSource
CrystalLipSync uses AudioSource.GetSpectrumData() to read FFT data. Unity only returns spectrum data from the exact AudioSource instance that is playing. If audio plays on a different source (e.g. GC2's pooled AudioManager sources, or a third-party dialogue system's own source), the lip sync analyzer will see silence.
Solution: Always play speech audio on the AudioSource referenced by the CrystalLipSyncController. Use the Play Lip Sync Speech GC2 instruction, or call source.Play() directly on the controller's source.
Spatial Blend affects audibility (not analysis)
The default setup uses spatialBlend = 1 (fully 3D). This means the audio volume depends on the distance between the AudioSource and the AudioListener. If the character is far from the camera, the speech may sound quiet. This does not affect lip sync analysis (FFT reads raw source data regardless of spatial blend), but it does affect what the player hears.
If you want speech to always be audible at full volume, set spatialBlend = 0 (2D) on the AudioSource.
Game Creator 2: "Player" target requires IsPlayer
When using GC2 instructions with the Player target property, the character must have IsPlayer = true enabled in the Character component. Without it, ShortcutPlayer.Instance returns null and the instruction silently does nothing.
Blendshape naming matters for auto-mapping
The auto-mapper relies on blendshape name patterns to detect visemes. If your model uses unconventional naming (e.g. shape_001, custom_23), auto-mapping will fail and you'll need to assign blendshapes manually in the inspector.
Common supported patterns: vrc.v_*, Fcl_MTH_*, mouth*, viseme_*, ARKit names (mouthOpen, mouthPucker, mouthSmile, etc.).
Eye blink: "Both" wins over Left + Right
If a combined blink blendshape (e.g. Blink, Fcl_EYE_Close) is detected alongside separate left/right blendshapes, the combined one is used exclusively. Left and Right are only driven when the combined blendshape is unmapped. This prevents double-driving that would produce exaggerated blink weights.
One Controller per AudioSource
Each CrystalLipSyncController should reference a unique AudioSource. Sharing a single AudioSource between multiple controllers will work but is redundant ... they'll all produce identical results. Conversely, a single controller can drive multiple CrystalLipSyncBlendshapeTarget components on different meshes.
LateUpdate blendshape override
CrystalLipSyncBlendshapeTarget writes blendshape values in LateUpdate, which runs after the Animator. This means lip sync values override animation clip data for the same blendshapes. If you have animation clips that drive mouth blendshapes, the lip sync will take priority when the target component is enabled.
Text lip sync is approximate
The text-to-viseme engine maps individual letters and common digraphs to visemes. English spelling is notoriously irregular, so the mapping is approximate ... it produces convincing mouth movement for most text but is not a phonetic parser. Results may vary for non-English text or unusual spellings.
Text vs. Audio lip sync priority
When both CrystalTextLipSync and audio analysis are active on the same character, audio always wins. If the controller's AudioSource.isPlaying is true, text-driven weights are not written to the controller's VisemeWeights array. Text lip sync only runs when the AudioSource is idle.
Microphone lip sync takes over the AudioSource
While CrystalMicrophoneLipSync is capturing, it replaces the AudioSource's clip with a looping microphone buffer. Pre-recorded audio cannot play on the same source simultaneously. When capture stops, the previous AudioSource state (clip, volume, loop, mute) is restored automatically.
If you need both microphone and pre-recorded lip sync on the same character, stop the mic capture before playing speech audio, then restart it after.
Microphone requires user permission on some platforms
On iOS, and Android, the browser or OS may require explicit user permission to access the microphone. Unity's Microphone.Start() will return null if permission is denied. The component logs a warning in this case. On WebGL Microphone to Lip Sync is currently not working.
GC2 Dialogue text lip sync requires role assignments
The CrystalDialogueLipSync component resolves the speaking Actor to a scene GameObject via the Dialogue's Roles assignments. If an Actor is not assigned a target GameObject in the Dialogue, the component cannot find the speaker and will silently skip that line.
Troubleshooting
Mouth doesn't move
Audio is playing on a different AudioSource than the one assigned to the controller.
Assign the correct AudioSource to the controller, or use the Play Lip Sync Speech instruction.
Mouth moves but very subtly
Sensitivity too low, or audio is very quiet.
Increase Sensitivity on the controller, or increase audio volume.
Mouth stays open
Volume Threshold is too low, picking up ambient noise.
Increase Volume Threshold.
Auto-Map found 0 visemes
Blendshape names don't match any known pattern.
Map blendshapes manually in the target inspector.
GC2 instruction does nothing
Target is set to "Player" but IsPlayer isn't enabled on the Character.
Enable IsPlayer on the Character component.
Eye blink weights are doubled
Both a combined blink and L/R blendshapes are being driven.
This should be handled automatically. If not, unmap the combined blendshape index (-1) and keep only L/R, or vice versa.
No audio heard, but mouth moves
spatialBlend = 1 and the AudioListener is far from the character.
Set spatialBlend = 0 for 2D audio, or move the camera closer.
Text lip sync doesn't animate
CrystalTextLipSync not found on the speaker, or PlayText wasn't called.
Ensure the component is on the character. For GC2, set the Actor's Gibberish to Crystal Text Lip Sync, or add CrystalDialogueLipSync to the scene. Check that the Actor has a role assignment in the Dialogue.
Text lip sync ignored during voice-over
Expected behavior ... audio lip sync takes priority.
If you want text lip sync during audio playback, disable "Skip When Audio Present" on CrystalDialogueLipSync.
Text lip sync ends before typewriter
Duration normalization is disabled, or an older version without it.
Ensure matchTypewriterDuration is true (default) in CrystalTextToViseme.Generate(). Update to the latest version.
Microphone lip sync: mouth doesn't move
Microphone not recording, or wrong device selected.
Check Microphone.devices has entries. Try leaving Device Name empty for the default mic. Ensure CrystalMicrophoneLipSync.IsCapturing is true.
Microphone lip sync: hearing own voice
Mute Playback is disabled.
Enable Mute Playback on the CrystalMicrophoneLipSync component. The FFT analysis works on muted sources.
Microphone lip sync: choppy animation
Buffer too short, causing frequent resyncs.
Increase Buffer Length to 2...3 seconds.
Last updated