Tactile feedback is generally recognized to be crucial for effective interaction with the physical world.
However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks.
Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets.
We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing without fine-tuning the base VLA.
Our method introduces two key innovations:
(1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning,
and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation.
Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision.
@misc{bi2025vlatouchenhancingvisionlanguageactionmodels,
title={VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback},
author={Jianxin Bi and Kevin Yuchen Ma and Ce Hao and Mike Zheng Shou and Harold Soh},
year={2025},
eprint={2507.17294},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2507.17294},
}