Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning for guiding the generation towards desired content
negativeCONDITIONINGYes-Negative conditioning for steering the generation away from unwanted content
vaeVAEYes-VAE model used for encoding and decoding image data
widthINTYes16 to MAX_RESOLUTIONOutput video width in pixels (default: 832, step: 16)
heightINTYes16 to MAX_RESOLUTIONOutput video height in pixels (default: 480, step: 16)
lengthINTYes1 to MAX_RESOLUTIONNumber of frames to generate (default: 77, step: 4)
batch_sizeINTYes1 to 4096Number of videos to generate simultaneously (default: 1)
clip_vision_outputCLIP_VISION_OUTPUTNo-Optional CLIP vision model output for additional conditioning
reference_imageIMAGENo-Reference image used as starting point for generation
face_videoIMAGENo-Video input providing facial expression guidance
pose_videoIMAGENo-Video input providing pose and motion guidance
continue_motion_max_framesINTYes1 to MAX_RESOLUTIONMaximum number of frames to continue from previous motion (default: 5, step: 4)
background_videoIMAGENo-Background video to composite with generated content
character_maskMASKNo-Mask defining character regions for selective processing
continue_motionIMAGENo-Previous motion sequence to continue from for temporal consistency
video_frame_offsetINTYes0 to MAX_RESOLUTIONThe amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video_frame_offset output of the previous node for extending a video. (default: 0, step: 1)
Parameter Constraints:
  • When pose_video is provided, the output length will be adjusted to match the pose video duration if the trim_to_pose_video logic is active (currently set to False in the source code)
  • face_video is automatically resized to 512x512 resolution and normalized to a range of -1.0 to 1.0 when processed
  • continue_motion frames are limited by the continue_motion_max_frames parameter; only the last continue_motion_max_frames frames from the input are used
  • Input videos (face_video, pose_video, background_video, character_mask) are offset by video_frame_offset before processing; if the offset exceeds the video length, the input is ignored
  • If character_mask contains only one frame, it will be repeated across all frames
  • When clip_vision_output is provided, it’s applied to both positive and negative conditioning
  • If reference_image is not provided, a black image (all zeros) is used as the default reference
  • If continue_motion is not provided, the initial frames are filled with gray (0.5 intensity) noise

Outputs

Output NameData TypeDescription
positiveCONDITIONINGModified positive conditioning with additional video context including CLIP vision output, pose video latent, face video pixels, concatenated latent image, and concatenated mask
negativeCONDITIONINGModified negative conditioning with additional video context including CLIP vision output, pose video latent, face video pixels (inverted), concatenated latent image, and concatenated mask
latentLATENTGenerated video content in latent space format with shape [batch_size, 16, latent_length + trim_latent, latent_height, latent_width]
trim_latentINTLatent space trimming information indicating the number of latent frames to trim from the beginning (corresponds to reference image latent frames)
trim_imageINTImage space trimming information for reference motion frames, indicating the number of image frames to trim from the beginning
video_frame_offsetINTUpdated frame offset for continuing video generation in chunks, calculated as the previous offset plus the generated length

Source fingerprint (SHA-256): 2ec2afbc57f58a5b7ce0ecc3730618633d435439ce2d650b18be531c1edddff0