A comparison of contextual bandit approaches to human-in-the-loop robot task completion with infrequent feedback
Robotics, Machine Learning, Reinforcement learning
Computer Engineering | Robotics
Artificially intelligent assistive agents are playing an increased role in our work and homes. In contrast with currently predominant conversational agents, whose intelligence derives from dialogue trees and external modules, a fully autonomous domestic or workplace robot must carry out more complex reasoning. Such a robot must make good decisions as soon as possible, learn from experience, respond to feedback, and rely on feedback only as much as necessary. In this research, we narrow the focus of a hypothetical robot assistant to a room tidying task in a simulated domestic environment. Given an item, the robot chooses where to put it among many destinations, then optionally receives feedback from a human operator. We frame the problem as a contextual bandit, a reinforcement learning approach frequently used in Web recommendation systems. We evaluate e-greedy and LinUCB action selection methods under a variety of infrequent feedback scenarios, with several methods for managing the lack of feedback. Our empirical results show that, while early-episode performance and overall accuracy of e-greedy action selection can be improved through learning from no-response feedback and careful management of remembered training episodes, a baseline LinUCB approach outperforms e- greedy action selection in early-episode performance, overall accuracy, and simplicity.
1st IEEE Int. Conf. on Tools with AI (ICTAI 2019), Nov 4-6 Portland Oregon, 2019.
Matt McNeill, Damian Lyons, “A Comparison of textual bandit Approaches to human-in-the-loop robot task completion with infrequent feedback.” To appear: 31st IEEE Int. Conf. on Tools with AI (ICTAI 2019), Nov 4-6 Portland Oregon, 2019.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.