Integration of multimodal input by using agents
MetadataShow full item record
Today, user interfaces normally consist of a screen, and a pointing device and a keyboard for input. However, as more advanced technology and methods appears, there should be good chances to utilize these for more natural and effective human-computer interfaces. The main motivation is to get a more natural and easy to use interface, and the computer should understand the user without too much effort from the user. Intelligent interfaces could be a solution to achieve this goal. The main focus in this thesis, is multimodal input which combines different input modalities to achieve the user's goal. A framework has been designed where the user has the possibility to change between input modalities. The system should integrate the information given in different input modalities to one joint meaning. In this architecture, input could either be location or command input, and different modalities could be used for each input type. The example described later on in this thesis combines either speech or written text as command input, with either map input or physical position for location input. An agent-based blackboard architecture are used for collecting input. Agents collect information directly from the user. Each agent represent their own input modality, and is responsible to analyse input. As this is done, the agent send the information to a common blackboard which hold the latest information from each agent. An own agent which is responsible for fusing this information to one common meaning, collects the information from the blackboard and integrate it to one joint meaning. This joint interpretation decides what should be done to which object. Since the modalities are independent of each other, other modalities could easily be added with just small changes to other parts of the system as long as it is an command or location input which agrees to the currently representation structure.