Show2Instruct

Generation of machine-processable control commands from natural language user interaction using object references in the visual system context

Project description:

On 13 March 2024, the robotics company Figure and its cooperation partners OpenAI and NVIDIA presented the robot "Figure 01", which is capable of interacting with humans not only in natural language, but also context-specifically in relation to its local environment. This is made possible by major advances in the field of foundation models, particularly for semantic image analysis and large language models (LLMs). However, the development of such AI-based interaction mechanisms, which integrate foundation models from computer vision and LLMs-Large Language Models, has not yet been the focus of research and development activities. For example, context-specific, natural language queries could be analysed during a building site inspection, e.g. "Do all windows and doors in this room comply with the specification in the BIM system and have all accessibility requirements been met?" The development of this field of research is still in its infancy, but will play a key role for all contextualised voice interaction systems in the future.

The project is intended to demonstrate the use of such generative AI models in an application domain - digitalisation of the construction sector. In this project, generative AI is to be used to develop a technology basis for human-machine interfaces that not only allows natural language operation of software and machines based on LLMs, but in particular can also take into account visually recognised objects in the local environmental context of the systems in prompts.

Consortium:

Ramblr GmbH;
neoBIM GmbH;
Clausthal University of Technology;
University of Rostock

Duration:

February 2025 - January 2028

Budget:

Total funding: € 3.8 million
Funding amount: € 2.6 million