2025-08 | | X-MoGen: Unified Motion Generation across Humans and Animals | - | | | | TransformerDiffusion | Graph |
2025-08 | | ReMoMask: Retrieval-Augmented Masked Motion Generation | - | | |  | VQ-VAETransformer | Retrieval |
2025-08 | | Semantically Consistent Text-to-Motion with Unsupervised Styles | SIGGRAPH 2025 | | | | UNetCNNTransformerDiffusion | |
2025-07 | | SnapMoGen: Human Motion Generation from Expressive Texts | - | | |  | VQ-VAETransformer | |
2025-07 | | Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data | ICCV 2025 | | |  | T5VQ-VAETransformer | |
2025-07 | | MOST: Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction | - | | | | Diffusion | |
2025-06 | | MotionGPT3: Human Motion as a Second Modality | - | | |  | LLM | |
2025-06 | | Generating Attribute-Aware Human Motions from Textual Prompt | - | | | | VQ-VAECLIPTransformer | Attribute |
2025-06 | | Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation | - | | |  | LLM | GRPORL |
2025-06 | | MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation | - | | | | VQ-VAETransformer | |
2025-06 | | ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model | - | | |  | UNetTransformer | |
2025-05 | | From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control | - | | | | LLM | |
2025-05 | | Absolute Coordinates Make Motion Generation Easy | - | | |  | CLIP | RoPE |
2025-05 | | Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion | MM 2025 | | |  | VAEDiffusion | Multi-Task |
2025-05 | | ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment | - | | | | Diffusion | |
2025-05 | | GENMO: A GENeralist Model for Human MOtion | - | | | | Transformer | Multi-TaskRoPE |
2025-05 | | Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis | CVPR 2025 | | | | TransformerGRU | Score-based |
2025-04 | | UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control | ICCV 2025 | | | | DiffusionTransformer | Physical |
2025-04 | | Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions | CVPR 2025 | | |  | VQ-VAELLM | Attribute |
2025-04 | | MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities | CVPR 2025 | | |  | | Finegrained |
2025-04 | | FlowMotion: Target-Predictive Conditional Flow Matching for Jitter-Reduced Text-Driven Human Motion Generation | - | | | | CLIPTransformerFlow Matching | Jerk |
2025-04 | | MixerMDM: Learnable Composition of Human Motion Diffusion Models | CVPR 2025 | | |  | TransformerDiffusion | Diversity |
2025-04 | | ReMoGPT: Part-Level Retrieval-Augmented Motion-Language Models | AAAI 2025 | | | | VQ-VAE | Multi-TaskRetrievalFine-grained |
2025-04 | | UniTMGE: Uniform Text-Motion Generation and Editing Model via Diffusion | WACV 2025 | | | | CLIPTransformerDiffusion | Multi-TaskEditing |
2025-03 | | Dynamic Motion Blending for Versatile Motion Editing | CVPR 2025 | | |  | DiffusionTransformerCLIP | Editing |
2025-03 | | Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion | - | | |  | Diffusion | TrajectoryDiversity |
2025-03 | | Human Motion Unlearning | - | | |  | VQ-VAE | |
2025-03 | | SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction | CVPR 2025 | | |  | CLIPDiffusionTransformer | Multi-TaskEditing |
2025-03 | | MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space | ICCV 2025 | | |  | DiffusionTransformer | |
2025-03 | | GenM$^3$: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation | - | | | | VQ-VAECLIPTransformer | Pre-Train |
2025-03 | | Reinforcement learning-based motion imitation for physiologically plausible musculoskeletal motor control | - | | |  | | Imitation LearningRL |
2025-03 | | Less is More: Improving Motion Diffusion Models with Sparse Keyframes | - | | | | Diffusion | |
2025-03 | | SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing | CVPR 2025 | | |  | CLIPVAE | FinegrainedEditing |
2025-03 | | Progressive Human Motion Generation Based on Text and Few Motion Frames | TCSVT 2025 | | |  | Transformer | |
2025-03 | | PersonaBooth: Personalized Text-to-Motion Generation | CVPR 2025 | | |  | CLIPDiffusionTransformer | |
2025-03 | | Motion Anything: Any to Motion Generation | - | | | | VQ-VAECLIPTransformer | |
2025-03 | | Biomechanics-Guided Residual Approach to Generalizable Human Motion Generation and Estimation | - | | | | UNet | Physical |
2025-03 | | Unlocking Pretrained LLMs for Motion-Related Multimodal Generation: A Fine-Tuning Approach to Unify Diffusion and Next-Token Prediction | - | | | | DiffusionTransformerLLM | LoRA |
2025-02 | | Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation | IJCV 2025 | | | | LLMDiffusion | Graph |
2025-02 | | MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm | ICCV 2025 | | |  | Transformer | EditingMulti-Task |
2025-02 | | CASIM: Composite Aware Semantic Injection for Text to Motion Generation | - | | |  | TransformerDiffusion | Component |
2025-02 | | SPORT: From Zero-shot Prompts to Real-time Motion Generation | TVCG 2025 | | | | CLIPTransformerDiffusion | MoEPhase |
2025-01 | | MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model | - | | | | TransformerVAEDiffusion | Efficient |
2025-01 | | Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss | - | | | | | ComponentFrequency |
2025-01 | | FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation | - | | | | Diffusion | PhysicalEfficientFinegrained |
2025-01 | | PackDiT: Joint Human Motion and Text Generation via Mutual Prompting | - | | | | DiffusionTransformer | Multi-Task |
2024-12 | | LS-GAN: Human Motion Synthesis with Latent-space GANs | WACV 2025 | | |  | GAN | Efficient |
2024-12 | | EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space | CVPR 2025 | | | | VAEDiffusion | Energy-basedDiversity |
2024-12 | | ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model | CVPR 2025 | | |  | VQ-VAETransformer | |
2024-12 | | Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation | AAAI 2025 | | |  | Mamba | Efficient |
2024-12 | | The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion | - | | |  | VQ-VAE | Multi-Task |
2024-12 | | CoMA: Compositional Human Motion Generation with Multi-modal Agents | - | | |  | LLMVQ-VAECLIPTransformer | Editing |
2024-12 | | SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization | - | | |  | | RLDPO |
2024-12 | | RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse | - | | | | DiffusionLLM | Retrieval |
2024-11 | | BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis | - | | | | CLIPTransformer | FinegrainedEditingMulti-Task |
2024-11 | | MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks | - | | | | DiffusionCLIPTransformer | |
2024-11 | | FTMoMamba: Motion Generation with Frequency and Text State Space Models | - | | | | MambaDiffusion | FrequencyFinegrained |
2024-11 | | VersatileMotion: A Unified Framework for Motion Synthesis and Comprehension | - | | | | LLMVQ-VAEFlow MatchingTransformer | |
2024-11 | | Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression | CVPR 2025 | | |  | CLIPDiffusionTransformer | |
2024-11 | | Morph: A Motion-free Physics Optimization Framework for Human Motion Generation | ICCV 2025 | | | | | PhysicalRLComponent |
2024-11 | | KMM: Key Frame Mask Mamba for Extended Motion Generation | - | | | | VQ-VAEMambaCLIP | |
2024-11 | | Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions | ECCV 2024 | | | | LLMVQ-VAE | Multi-Task |
2024-11 | | M-Adaptor: Text-driven Whole-body Human Motion Generation | CVPR Workshop 2025 | | | | Transformer | LoRA |
2024-10 | | MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding | - | | | | LLMVQ-VAE | Multi-TaskLoRA |
2024-10 | | Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing | - | | | | Transformer | FinegrainedEditing |
2024-10 | | LEAD: Latent Realignment for Human Motion Diffusion | - | | | | VQ-VAE | Component |
2024-10 | | MaskControl: Spatio-Temporal Control for Masked Motion Synthesis | ICCV 2025 | | |  | Transformer | Trajectory |
2024-10 | | ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model | WACV 2025 | | | | Diffusion | RLPhysical |
2024-10 | | LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning | ICLR 2025 | | |  | Transformer | Pre-Train |
2024-10 | | MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning | - | | | | Transformer | PPORL |
2024-10 | | A Unified Framework for Motion Reasoning and Generation in Human Interaction | ICCV 2025 | | | | LLM | Multi-Task |
2024-10 | | DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control | ICLR 2025 | | |  | CLIPTransformerVAEDiffusion | Multi-TaskEfficient |
2024-10 | | UniMuMo: Unified Text, Music and Motion Generation | - | | |  | Transformer | Multi-Task |
2024-10 | | Scaling Large Motion Models with Million-Level Human Motions | ICML 2025 | | |  | VQ-VAELLM | |
2024-10 | | CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation | MM 24 | | |  | Transformer | |
2024-10 | | Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating
| MM 2024 | | |  | CLIPLLMDiffusion | Graph |
2024-09 | | Text-driven Human Motion Generation with Motion Masked Diffusion Model | - | | | | Diffusion | |
2024-09 | | EgoLM: Multi-Modal Language Model of Egocentric Motions | CVPR 2025 | | | | LLMVAE | Multi-Task |
2024-09 | | MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling | NeurIPS 2024 | | |  | VQ-VAE | Editing |
2024-09 | | Unimotion: Unifying 3D Human Motion Synthesis and Understanding | 3DV 2025 | | |  | CLIPTransformer | EditingFinegrainedMulti-Task |
2024-09 | | MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting | SIGGRAPH Asia 2024 | | |  | | Multi-TaskPhysicalRL |
2024-09 | | T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data | - | | | | VQ-VAETransformer | |
2024-09 | | MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | WACV 2025 | | |  | LLMDiffusion | |
2024-09 | | BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation | - | | |  | CLIPTransformer | |
2024-09 | | Lagrangian Motion Fields for Long-term Motion Generation | - | | | | | ComponentMulti-Task |
2024-08 | | TextIM: Part-aware Interactive Motion Synthesis from Text | EUROGRAPHICS 2025 | | | | Transformer | Finegrained |
2024-08 | | MotionFix: Text-Driven 3D Human Motion Editing | SIGGRAPH Asia 2024 | | |  | CLIPDiffusionTransformer | Editing |
2024-08 | | Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion | AAAI 2025 | | | | | Attack |
2024-07 | | MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls | AAAI 2025 | | |  | DiffusionTransformer | Multi-Task |
2024-07 | | M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models | ECCV 2024 | | | | VQ-VAETransformer | Diversity |
2024-07 | | SMooDi: Stylized Motion Diffusion Model | ECCV 2024 | | |  | Diffusion | Style |
2024-07 | | Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation | - | | | | DiffusionTransformer | FinegrainedGraph |
2024-07 | | SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation | - | | | | | RLPhysical |
2024-07 | | InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation | - | | | | VQ-VAETransformerMamba | |
2024-07 | | Infinite Motion: Extended Motion Generation via Long Text Instructions | - | | | | | Editing |
2024-07 | | MotionGPT: Human Motion Synthesis with Improved Diversity and Realism via GPT-3 Prompting | WACV 2024 | | |  | LLMCLIPTransformerDiffusion | |
2024-06 | | MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training | CVPR Workshop 2025 | | |  | VAEGAN | TrajectoryMulti-Task |
2024-06 | | T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences | CVPR Workshop 2024 | | | | VQ-VAETransformer | |
2024-05 | | Programmable Motion Generation for Open-Set Motion Control Tasks | CVPR 2024 | | |  | | FinegrainedTrajectoryMultiTask |
2024-05 | | Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs | ICLR 2025 | | |  | LLM | LoRAEditing |
2024-05 | | A Cross-Dataset Study for Text-based 3D Human Motion Retrieval | CVPR Workshop 2024 | | |  | | Retrieval |
2024-05 | | M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation | NeurIPS 2024 | | |  | LLM | Multi-Task |
2024-05 | | Learning Generalizable Human Motion Generator with Reinforcement Learning | - | | | | CLIPTransformerVQ-VAE | RL |
2024-05 | | Shape Conditioned Human Motion Generation with Diffusion Model | - | | | | Diffusion | FrequencyAttribute |
2024-05 | | StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework | MM 2024 | | |  | UNetDiffusionTransformer | |
2024-05 | | Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches | CVPR 2024 | | |  | Transformer | Retrieval |
2024-05 | | MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization | - | | | | Diffusion | Preference |
2024-05 | | LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model | SIGGRAPH 2024 | | |  | Transformer | Finegrained |
2024-05 | | Efficient Text-driven Motion Generation via Latent Consistency Training | - | | | | Diffusion | Score-based |
2024-05 | | SATO: Stable Text-to-Motion Framework | MM 2024 | | |  | CLIPTransformer | |
2024-04 | | MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model | ECCV 2024 | | |  | | TrajectoryEfficient |
2024-04 | | You Think, You ACT: The New Task of Arbitrary Text to Motion Generation | ICCV 2025 | | | | LLMTransformerVQ-VAE | LoRA |
2024-04 | | MCM: Multi-condition Motion Synthesis Framework | - | | | | Transformer | Multi-Task |
2024-04 | | Exploring Text-to-Motion Generation with Human Preference | CVPR Workshop 2024 | | |  | | PreferenceRL |
2024-04 | | MotionChain: Conversational Motion Controllers via Multimodal Prompts | - | | | | LLM | |
2024-04 | | Large Motion Model for Unified Multi-Modal Motion Generation | - | | |  | DiffusionTransformer | Multi-Task |
2024-03 | | BAMM: Bidirectional Autoregressive Motion Model | ECCV 2024 | | |  | VQ-VAECLIPTransformer | |
2024-03 | | ParCo: Part-Coordinating Text-to-Motion Synthesis | ECCV 2024 | | |  | VQ-VAETransformer | Finegrained |
2024-03 | | Contact-aware Human Motion Generation from Textual Descriptions | - | | | | VQ-VAETransformer | |
2024-03 | | CoMo: Controllable Motion Generation through Language Guided Pose Code Editing | ECCV 2024 | | |  | LLMCLIPTransformer | FinegrainedEditing |
2024-03 | | AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents | CVPR 2024 | | |  | CLIP | PhysicalOpen-VocabularyImitation LearningReward |
2024-03 | | Motion Mamba: Efficient and Long Sequence Motion Generation | ECCV 2024 | | | | Mamba | |
2024-03 | | MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model | - | | |  | Transformer | Multi-Task |
2024-02 | | Seamless Human Motion Composition with Blended Positional Encodings | CVPR 2024 | | |  | Transformer | Jerk |
2024-01 | | MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation | AAAI 2024 | | |  | Diffusion | |
2024-01 | | Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation | CVPR Workshop 2024 | | |  | Diffusion | Finegrained |
2024-01 | | GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation | TVCG 2024 | | |  | CLIPTransformerDiffusion | Diversity |
2023-12 | | InsActor: Instruction-driven Physics-based Characters | NeurIPS 2023 | | |  | Diffusion | TrajectoryPhysical |
2023-12 | | FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing | NeurIPS 2023 | | |  | LLMDiffusionTransformer | Finegrained |
2023-12 | | Plan, Posture and Go: Towards Open-World Text-to-Motion Generation | ECCV 2024 | | | | CLIPLLMDiffusion | |
2023-12 | | Iterative Motion Editing with Natural Language | SIGGRAPH 2024 | | |  | TransformerDiffusionLLM | Editing |
2023-12 | | Realistic Human Motion Generation with Cross-Diffusion Models | ECCV 2024 | | |  | Diffusion | |
2023-12 | | Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model | AAAI 2024 | | | | CLIPDiffusion | |
2023-12 | | OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers | CVPR 2024 | | | | Diffusion | Pre-TrainOpen-Vocabulary |
2023-12 | | MMM: Generative Masked Motion Model | CVPR 2024 | | |  | TransformerDiffusion | Editing |
2023-12 | | EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation | ECCV 2024 | | |  | DiffusionGAN | Efficient |
2023-11 | | MoMask: Generative Masked Modeling of 3D Human Motions | CVPR 2024 | | |  | VQ-VAETransformer | |
2023-11 | | TLControl: Trajectory and Language Control for Human Motion Synthesis | - | | |  | VQ-VAETransformer | Trajectory |
2023-11 | | A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis | - | | |  | VQ-VAETransformer | |
2023-11 | | Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs | NeurIPS 2023 | | |  | Transformer | Finegrained |
2023-10 | | HumanTOMATO: Text-aligned Whole-body Motion Generation | ICML 2024 | | |  | VQ-VAELLM | Finegrained |
2023-10 | | MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations | SIGGRAPH 2024 | | |  | LLMTransformer | Multi-TaskPhysical |
2023-10 | | Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases | ECCV 2024 | | |  | | Finegrained |
2023-09 | | Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model | ICCV 2023 | | | | CLIPDiffusion | Graph |
2023-09 | | AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism | ICCV 2023 | | |  | VQ-VAETransformer | Finegrained |
2023-08 | | Priority-Centric Human Motion Generation in Discrete Latent Space | ICCV 2023 | | | | VQ-VAEDiffusionTransformer | |
2023-08 | | Language-guided Human Motion Synthesis with Atomic Actions | MM 2023 | | |  | CLIPTransformer | |
2023-06 | | MotionGPT: Human Motion as a Foreign Language | NeurIPS 2023 | | |  | LLMVQ-VAE | |
2023-06 | | MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators | AAAI 2024 | | |  | LLMVQ-VAE | LoRA |
2023-05 | | Enhanced Fine-grained Motion Diffusion for Text-driven Human Motion Synthesis | AAAI 2024 | | | | CLIPTransformer | |
2023-05 | | Guided Motion Diffusion for Controllable Human Motion Synthesis | ICCV 2023 | | |  | Diffusion | Trajectory |
2023-05 | | Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation | ICCV 2023 | | | | Transformer | |
2023-05 | | AMD: Autoregressive Motion Diffusion | AAAI 2024 | | |  | TransformerDiffusion | |
2023-05 | | TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis | ICCV 2023 | | |  | | Retrieval |
2023-04 | | TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration | ICCV 2023 | | |  | VQ-VAE | Multi-Task |
2023-04 | | ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model | ICCV 2023 | | |  | Transformer | Retrieval |
2023-03 | | Human Motion Diffusion as a Generative Prior | ICLR 2024 | | |  | Diffusion | FinegrainedTrajectory |
2023-01 | | T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations | CVPR 2023 | | |  | VQ-VAETransformer | |
2023-01 | | Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models | - | | | | Diffusion | |
2022-12 | | MultiAct: Long-Term 3D Human Motion Generation from Multiple Action Labels | AAAI 2023 | | |  | VAE | |
2022-12 | | MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis | CVPR 2023 | | | | UNetTransformerDiffusion | |
2022-12 | | Executing your Commands via Motion Diffusion in Latent Space | CVPR 2023 | | |  | VAEDiffusion | |
2022-12 | | PhysDiff: Physics-Guided Human Motion Diffusion Model | ICCV 2023 | | | | Diffusion | PhysicalComponent |
2022-11 | | UDE: A Unified Driving Engine for Human Motion Generation | CVPR 2023 | | |  | VQ-VAETransformer | |
2022-11 | | Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation | ICME 2023 | | |  | LLMVAE | |
2022-10 | | Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training | - | | |  | CLIPTransformer | Open-Vocabulary |
2022-09 | | Human Motion Diffusion Model | ICLR 2023 | | |  | CLIPTransformerDiffusion | |
2022-09 | | TEACH: Temporal Action Composition for 3D Humans | 3DV 2022 | | |  | Transformer | |
2022-09 | | FLAME: Free-form Language-based Motion Synthesis & Editing | AAAI 2023 | | |  | Transformer | Editing |
2022-08 | | MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model | TPAMI 2024 | | |  | DiffusionTransformer | |
2022-07 | | TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts | ECCV 2022 | | |  | GRU | Multi-Task |
2022-06 | | Generating Diverse and Natural 3D Human Motions from Text | CVPR 2022 | | |  | GRU | |
2022-05 | | AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars | SIGGRAPH 2022 | | |  | CLIPVAE | |
2022-04 | | TEMOS: Generating diverse human motions from textual descriptions | ECCV 2022 | | |  | VAETransformer | |
2022-03 | | Implicit Neural Representations for Variable Length Human Motion Generation | ECCV 2022 | | |  | VAE | |
2022-03 | | MotionCLIP: Exposing Human Motion Generation to CLIP Space | ECCV 2022 | | |  | CLIPTransformer | |
2021-04 | | Action-Conditioned 3D Human Motion Synthesis with Transformer VAE | ICCV 2021 | | |  | Transformer | |
2020-07 | | Action2Motion: Conditioned Generation of 3D Human Motions | MM 2020 | | |  | GRU | |