Vision-language-action models are the next leap in autonomous robotics
GR00T N1 is an example of a vision-language-action model. Source: NVIDIA
Robotics has traditionally used modular pipelines. Perception, planning, and control sit in separate systems and connect through hand-tuned interfaces. This approach works for simple, well-defined tasks. It struggles when environments change or when robots must follow flexible instructions. Vision-language-action, or VLA, models offer a different path.
Systems such as Figure AI’s Helix, NVIDIA’s GR00T N1, and Google DeepMind’s RT-1...