What is DreamActor-M1?

DreamActor-M1 is a tool that creates realistic human animations. It uses a special method to control animations smoothly and adapt them to different sizes, from close-up portraits to full-body movements. By using a reference image, it can mimic actions from videos, ensuring the animations look consistent and true to the original person.

DreamActor-M1

Overview of DreamActor-M1

FeatureDescription
AI ToolDreamActor-M1 AI
CategoryHuman Animation Framework
FunctionRealistic Human Animation
Generation SpeedReal-time Processing
Research Paperarxiv.org/pdf/2504.01724
Official Websitegrisoon.github.io/DreamActor-M1/

Method Overview: Overview of DreamActor-M1

During the training stage, we first extract body skeletons and head spheres from driving frames and then encode them to the pose latent using the pose encoder. The resultant pose latent is combined with the noised video latent along the channel dimension. The video latent is obtained by encoding a clip from the input full video using 3D VAE. Facial expression is additionally encoded by the face motion encoder, to generate implicit facial representations.

Note that the reference image can be one or multiple frames sampled from the input video to provide additional appearance details during training and the reference token branch shares weights of our DiT model with the noise token branch. Finally, the denoised video latent is supervised by the encoded video latent.

DreamActor-M1

Within each DiT block, the face motion token is integrated into the noise token branch via cross-attention (Face Attn), while appearance information of ref token is injected to noise token through concatenated self-attention (Self Attn) and subsequent cross-attention (Ref Attn).

Key Features of DreamActor-M1

  • Diversity

    Our method is robust to various character and motion styles.

  • Controllability and Robustness

    Our method can extend to audio-driven facial animation, delivering lip-sync results in multiple languages. Our complementary visual guidance ensures better temporal consistency, particularly for human poses not observed in the reference. Our method supports transferring only a part of the motion, such as facial expressions and head movements.

  • Motion Guidance

    Use a sophisticated control system that integrates implicit facial cues, 3D head models, and 3D body frameworks to deliver dynamic and expressive facial and body animations.

  • Scale Adaptability

    Employs a step-by-step training approach that leverages multi-resolution datasets to effectively handle a range of body poses and image scales, from close-up portraits to full-body shots.

  • Appearance Guidance

    Leverages motion sequences and visual references to ensure consistent temporal flow, especially in areas not previously encountered during intricate movements.

Examples of DreamActor-M1 in Action

Various character and motion styles.

How does DreamActor-M1 compare to state-of-the-art (SOTA) methods?

  • Fine-grained motion control (expressive facial and body movements).
  • Identity preservation (maintaining the subject's appearance).
  • Temporal consistency (smooth transitions over long sequences).
  • Robustness (handling unseen poses and complex motions).

Pros and Cons

Pros

  • Fine-grained controllability for expressive animations
  • Multi-scale adaptability from portrait to full-body
  • Long-term temporal coherence in animations
  • Supports selective motion transfer
  • Audio-driven lip-sync animations in multiple languages
  • Pros

Cons

  • Requires high-quality reference images for best results
  • Computationally intensive for high-resolution outputs
  • Cons

DreamActor-M1 FAQs