AffordanceVLA：用可供性做中间表征，解决VLA「看见了却动不准」问题

Type: paper
Author: 北京大学/HKUST-GZ/CUHK/Knowin AI
Primary Topic: AI Agent
Ingested: 2026-06-10

Summary

AffordanceVLA论文提出用「可供性」作为VLA模型的中间表征层，解决视觉语言模型语义空间与动作控制空间之间的对齐鸿沟。架构采用三阶段Mixture-of-Transformer流水线，包含Which2Act、Where2Act、How2Act三个专家模块，分别负责目标定位、2D可供性图生成和3D几何推理。该方案提升了机器人动作的可解释性，并对边缘硬件部署友好。

Key Concepts

VLA模型
可供性中间表征
语义-动作对齐
Mixture-of-Transformer
Which2Act
Where2Act
How2Act

Entities

北京大学
香港科技大学
香港中文大学
Knowin AI
AffordanceVLA

Source

Raw: affordancevla-affordance-intermediate-representation.md

Relations

(none)

Auto-generated on 2026-06-10