Degree of Contribution

Lead

Document Type

Conference Proceeding

Keywords

Robotics, Machine Learning, Reinforcement learning

Disciplines

Computer Engineering | Robotics

Abstract

Artificially intelligent assistive agents are playing an increased role in our work and homes. In contrast with currently predominant conversational agents, whose intelligence derives from dialogue trees and external modules, a fully autonomous domestic or workplace robot must carry out more complex reasoning. Such a robot must make good decisions as soon as possible, learn from experience, respond to feedback, and rely on feedback only as much as necessary. In this research, we narrow the focus of a hypothetical robot assistant to a room tidying task in a simulated domestic environment. Given an item, the robot chooses where to put it among many destinations, then optionally receives feedback from a human operator. We frame the problem as a contextual bandit, a reinforcement learning approach frequently used in Web recommendation systems. We evaluate e-greedy and LinUCB action selection methods under a variety of infrequent feedback scenarios, with several methods for managing the lack of feedback. Our empirical results show that, while early-episode performance and overall accuracy of e-greedy action selection can be improved through learning from no-response feedback and careful management of remembered training episodes, a baseline LinUCB approach outperforms e- greedy action selection in early-episode performance, overall accuracy, and simplicity.

Publication Title

1st IEEE Int. Conf. on Tools with AI (ICTAI 2019), Nov 4-6 Portland Oregon, 2019.

Issue

1

Article Number

1075

Publication Date

11-2019

Language

United States

Peer Reviewed

1

Version

Published

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Included in

Robotics Commons

Share

COinS