WanAnimateToVideo - ComfyUI Built-in Node Documentation

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

The WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.

Inputs

Parameter	Data Type	Required	Range	Description
`positive`	CONDITIONING	Yes	-	Positive conditioning for guiding the generation towards desired content
`negative`	CONDITIONING	Yes	-	Negative conditioning for steering the generation away from unwanted content
`vae`	VAE	Yes	-	VAE model used for encoding and decoding image data
`width`	INT	Yes	16 to MAX_RESOLUTION	Output video width in pixels (default: 832, step: 16)
`height`	INT	Yes	16 to MAX_RESOLUTION	Output video height in pixels (default: 480, step: 16)
`length`	INT	Yes	1 to MAX_RESOLUTION	Number of frames to generate (default: 77, step: 4)
`batch_size`	INT	Yes	1 to 4096	Number of videos to generate simultaneously (default: 1)
`clip_vision_output`	CLIP_VISION_OUTPUT	No	-	Optional CLIP vision model output for additional conditioning
`reference_image`	IMAGE	No	-	Reference image used as starting point for generation
`face_video`	IMAGE	No	-	Video input providing facial expression guidance
`pose_video`	IMAGE	No	-	Video input providing pose and motion guidance
`continue_motion_max_frames`	INT	Yes	1 to MAX_RESOLUTION	Maximum number of frames to continue from previous motion (default: 5, step: 4)
`background_video`	IMAGE	No	-	Background video to composite with generated content
`character_mask`	MASK	No	-	Mask defining character regions for selective processing
`continue_motion`	IMAGE	No	-	Previous motion sequence to continue from for temporal consistency
`video_frame_offset`	INT	Yes	0 to MAX_RESOLUTION	The amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video_frame_offset output of the previous node for extending a video. (default: 0, step: 1)

Parameter Constraints:

When pose_video is provided, the output length will be adjusted to match the pose video duration if the trim_to_pose_video logic is active (currently set to False in the source code)
face_video is automatically resized to 512x512 resolution and normalized to a range of -1.0 to 1.0 when processed
continue_motion frames are limited by the continue_motion_max_frames parameter; only the last continue_motion_max_frames frames from the input are used
Input videos (face_video, pose_video, background_video, character_mask) are offset by video_frame_offset before processing; if the offset exceeds the video length, the input is ignored
If character_mask contains only one frame, it will be repeated across all frames
When clip_vision_output is provided, it’s applied to both positive and negative conditioning
If reference_image is not provided, a black image (all zeros) is used as the default reference
If continue_motion is not provided, the initial frames are filled with gray (0.5 intensity) noise

Outputs

Output Name	Data Type	Description
`positive`	CONDITIONING	Modified positive conditioning with additional video context including CLIP vision output, pose video latent, face video pixels, concatenated latent image, and concatenated mask
`negative`	CONDITIONING	Modified negative conditioning with additional video context including CLIP vision output, pose video latent, face video pixels (inverted), concatenated latent image, and concatenated mask
`latent`	LATENT	Generated video content in latent space format with shape [batch_size, 16, latent_length + trim_latent, latent_height, latent_width]
`trim_latent`	INT	Latent space trimming information indicating the number of latent frames to trim from the beginning (corresponds to reference image latent frames)
`trim_image`	INT	Image space trimming information for reference motion frames, indicating the number of image frames to trim from the beginning
`video_frame_offset`	INT	Updated frame offset for continuing video generation in chunks, calculated as the previous offset plus the generated length

Source fingerprint (SHA-256): 2ec2afbc57f58a5b7ce0ecc3730618633d435439ce2d650b18be531c1edddff0

Documentation Index

​Inputs

​Outputs

Inputs

Outputs