MMLab HKU 闪耀 CVPR 2025!与全球顶尖学者共话 AI 前沿

360影视 欧美动漫 2025-06-11 19:17 3

摘要:作为计算机视觉领域最具影响力的国际会议之一,CVPR(IEEE Conference on Computer Vision and Pattern Recognition)每年都汇聚了全球顶尖高校、研究机构与产业界的最新突破与前沿成果。CVPR 2025 将于

CVPR 2025 将于 6 月 11 日至 15 日在美国纳什维尔举行。MMLab @HKU 携24 篇高质量论文隆重亮相。”

01

CVPR 2025 纳什维尔开幕在即:

MMLab 携前沿成果深度参与

作为计算机视觉领域最具影响力的国际会议之一,CVPR(IEEE Conference on Computer Vision and Pattern Recognition)每年都汇聚了全球顶尖高校、研究机构与产业界的最新突破与前沿成果。CVPR 2025 将于 6 月 11 日至 15 日在美国纳什维尔举行。MMLab@HKU 携24 篇高质量论文隆重亮相,涵盖图像生成、视频理解、具身智能、三维重建、多模态融合等多个研究热点。欢迎大家前来与论文作者面对面交流!

相关网站:https://mmlab.hk/event/cvpr2025

02 三大国际竞赛:以赛促研,智造未来

在 CVPR 2025,MMLab是三项国际竞赛的发起与主办方,涵盖开放世界自动驾驶以及机器人交互智能等多个热门方向。通过精心设计的任务设置与评测机制,团队为全球研究者搭建了一个聚焦真实场景挑战技术落地能力的竞技舞台。我们希望以此激发更多创新灵感,共同拓展视觉智能的未来边界。

Autonomous Grand Challenge 2025

End-to-End Autonomous Driving through V2X Cooperation

RoboTwin Dual-Arm Collaboration Challenge

03 六场深度活动:解锁 AI 落地的技术密码

除了国际竞赛,MMLab 在 CVPR 2025 也主办了六项前沿 Workshop、Tutorial 活动,全面覆盖自动驾驶、多模态、世界模型、协同感知、数据赋能等热点议题。

Embodied Intelligence for Autonomous Systems on the Horizon

Workshop on Autonomous Driving

Distillation of Foundation Models for Autonomous Driving

Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures

Robotics 101: An Odyssey from A Vision Perspective

The 1st Workshop on Benchmarking World Models

04 技术风向标:多项AI前沿研究盘点

在生成式智能与多模态感知飞速发展的当下,这一系列研究成果展示了在跨模态理解、场景生成、人机交互和机器人智能等领域的一些进步。比如,文本驱动的视频合成、图像安全性评估、高精度的三维高斯建模和机器人操作策略学习这些技术,都在提升模型的通用性、效率以及在现实世界中的适应能力。不管你关心的是更安全可信的生成系统、更聪明的机器人大脑,还是更高质量的视觉生成模型,这些项目都代表了技术创新的前沿,欢迎关注!

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization [Oral]

统一物理人景交互合成,通过任务分词实现

arXiv: https://arxiv.org/abs/2503.19901

Github: https://github.com/liangpan99/TokenHSI

Parallelized Autoregressive Visual Generation [Highlight]

PAR,根据视觉token间依赖关系所设计的并行自回归生成模型

arXiv: https://arxiv.org/abs/2412.15119

Github: https://yuqingwang1029.github.io/PAR-project/

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins [Highlight]

机器人双臂评测基准集与数据合成器

arXiv: https://arxiv.org/abs/2504.13059

Github: https://github.com/TianxingChen/RoboTwin

HMAR: Efficient Hierarchical Masked AutoRegressive Image Generation

HMAR,通过多尺度自回归与掩码重建结合的高效高质量图像生成模型

arXiv: https://arxiv.org/html/2506.04421v1

Project Page: https://research.nvidia.com/labs/dir/hmar/

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

MBQ,均衡视觉和语言之间敏感性差异的视觉-语言模型量化方法

arXiv: https://arxiv.org/abs/2412.19509

Github: https://github.com/thu-nics/MBQ

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

MIDI-3D,拓展3D物体生成模型到可组合的3D场景生成。

Arxiv: https://arxiv.org/abs/2412.03558

Github: https://github.com/VAST-AI-Research/MIDI-3D

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

T2ISafety,一个评估文生图模型安全性的基准

Arxiv: https://arxiv.org/abs/2501.12612

Github: https://github.com/adwardlee/t2i_safety

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

T2V-CompBench,评估文生视频模型的组合生成能力

arXiv: https://arxiv.org/abs/2407.14505

Github: https://t2v-compbench-2025.github.io/

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

通过 3D Gaussian Splatting 实现高效的组合文本到三维内容生成

arXiv: https://arxiv.org/abs/2410.20723

Github: https://chongjiange.github.io/compgs.html

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

扩散模型驱动,生成多变3D角色并自动绑定

arXiv: https://arxiv.org/abs/2411.17423

Github: https://github.com/yisuanwang/DRiVE

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

自适应灵巧手操作的交互感知扩散策略

arXiv: https://arxiv.org/abs/2411.18562

Github: https://dexdiffuser.github.io/

Distilling Monocular Foundation Model for Fine-grained Depth Completion

知识蒸馏得到单目基础模型用于将稀疏深度稠密化

arXiv: https://arxiv.org/abs/2503.16970

Github: https://github.com/Sharpiless/DMD3C

Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering

高效 3D Gaussian Splatting,实现大规模高分辨率渲染

arXiv: https://arxiv.org/abs/2408.07967

Github: https://github.com/InternLandMark/FlashGS

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

针对大型视觉语言模型伪造检测能力的全新综合评测基准

arXiv: https://arxiv.org/pdf/2503.15024

Github: https://github.com/Forensics-Bench/Forensics-Bench

project page: https://forensics-bench.github.io/

dataset:https://huggingface.co/datasets/Forensics-bench/Forensics-bench

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

生成式机器人3D操作增强表征

arXiv: https://arxiv.org/abs/2411.18369

Github: https://github.com/TianxingChen/G3Flow

GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning

通过视频预训练,根据当前图结构预测生成未来图结构

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

统一空地视角3D Gaussian Splatting,重建渲染大场景

arXiv: https://arxiv.org/abs/2412.01745

Github: https://github.com/OpenRobotLab/HorizonGS

Janus: Decoupling visual encoding for unified multimodal understanding and generation

解耦视觉编码以实现统一的多模态理解和生成

arXiv: https://arxiv.org/abs/2410.13848

Github: https://github.com/deepseek-ai/Janus

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

用CARLA的仿真数据帮助减少自动驾驶感知中实际数据标注的需求

arXiv: https://arxiv.org/abs/2503.08422

Github: https://github.com/Runjian-Chen/JiSAM

MangaNinja: Line Art Colorization with Precise Reference Following

精准可控的线稿上色

arXiv: https://arxiv.org/abs/2501.08332

Github: https://github.com/ali-vilab/MangaNinjia

NADER: Neural Architecture Design via Multi-Agent Collaboration

多智能体协作的神经网络架构设计

arXiv: https://arxiv.org/abs/2412.19206

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

首个面向开放式图文交错生成任务的综合评测基准

arXiv: https://arxiv.org/abs/2411.18499

Github: https://opening-benchmark.github.io/

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

面向机器人操控的 MLLM 大模型

arXiv: https://arxiv.org/abs/2502.21257

Github: https://github.com/FlagOpen/RoboBrain

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

探索自回归运动生成模型的缩放定律

arXiv: https://arxiv.org/abs/2412.14559

Github: https://github.com/shunlinlu/ScaMo_code

EdgeTAM: On-Device Track Anything Model

压缩视频分割基模型SAM2,保持模型效果同时实现端侧部署

//

来源:AI科技评论一点号

相关推荐