Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThe WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
positive | CONDITIONING | Yes | - | Positive conditioning for guiding the generation towards desired content |
negative | CONDITIONING | Yes | - | Negative conditioning for steering the generation away from unwanted content |
vae | VAE | Yes | - | VAE model used for encoding and decoding image data |
width | INT | Yes | 16 to MAX_RESOLUTION | Output video width in pixels (default: 832, step: 16) |
height | INT | Yes | 16 to MAX_RESOLUTION | Output video height in pixels (default: 480, step: 16) |
length | INT | Yes | 1 to MAX_RESOLUTION | Number of frames to generate (default: 77, step: 4) |
batch_size | INT | Yes | 1 to 4096 | Number of videos to generate simultaneously (default: 1) |
clip_vision_output | CLIP_VISION_OUTPUT | No | - | Optional CLIP vision model output for additional conditioning |
reference_image | IMAGE | No | - | Reference image used as starting point for generation |
face_video | IMAGE | No | - | Video input providing facial expression guidance |
pose_video | IMAGE | No | - | Video input providing pose and motion guidance |
continue_motion_max_frames | INT | Yes | 1 to MAX_RESOLUTION | Maximum number of frames to continue from previous motion (default: 5, step: 4) |
background_video | IMAGE | No | - | Background video to composite with generated content |
character_mask | MASK | No | - | Mask defining character regions for selective processing |
continue_motion | IMAGE | No | - | Previous motion sequence to continue from for temporal consistency |
video_frame_offset | INT | Yes | 0 to MAX_RESOLUTION | The amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video_frame_offset output of the previous node for extending a video. (default: 0, step: 1) |
- When
pose_videois provided, the output length will be adjusted to match the pose video duration if thetrim_to_pose_videologic is active (currently set toFalsein the source code) face_videois automatically resized to 512x512 resolution and normalized to a range of -1.0 to 1.0 when processedcontinue_motionframes are limited by thecontinue_motion_max_framesparameter; only the lastcontinue_motion_max_framesframes from the input are used- Input videos (
face_video,pose_video,background_video,character_mask) are offset byvideo_frame_offsetbefore processing; if the offset exceeds the video length, the input is ignored - If
character_maskcontains only one frame, it will be repeated across all frames - When
clip_vision_outputis provided, it’s applied to both positive and negative conditioning - If
reference_imageis not provided, a black image (all zeros) is used as the default reference - If
continue_motionis not provided, the initial frames are filled with gray (0.5 intensity) noise
Outputs
| Output Name | Data Type | Description |
|---|---|---|
positive | CONDITIONING | Modified positive conditioning with additional video context including CLIP vision output, pose video latent, face video pixels, concatenated latent image, and concatenated mask |
negative | CONDITIONING | Modified negative conditioning with additional video context including CLIP vision output, pose video latent, face video pixels (inverted), concatenated latent image, and concatenated mask |
latent | LATENT | Generated video content in latent space format with shape [batch_size, 16, latent_length + trim_latent, latent_height, latent_width] |
trim_latent | INT | Latent space trimming information indicating the number of latent frames to trim from the beginning (corresponds to reference image latent frames) |
trim_image | INT | Image space trimming information for reference motion frames, indicating the number of image frames to trim from the beginning |
video_frame_offset | INT | Updated frame offset for continuing video generation in chunks, calculated as the previous offset plus the generated length |
Source fingerprint (SHA-256):
2ec2afbc57f58a5b7ce0ecc3730618633d435439ce2d650b18be531c1edddff0