Automated Shielding using Kinematic and Dynamic Models for Feasible and Safe Reinforcement Learning
Reinforcement learning is quickly becoming popular in the robotics and autonomous systems community, where autonomous systems are used to learn increasingly complex tasks in increasingly complex environments. Reinforcement learning works by exposing an agent to a series of trial-and-error interactions in the environment, which over time, allows the agent to estimate the outcome of an action before it is taken. During the learning phase, this method of trial-and-error comes at the risk that infeasible or unsafe actions might be attempted. Moreover, if any of these infeasible or unsafe actions are incorporated into the final learned model, these actions could lead to fatal consequences for both the autonomous system or humans involved. Recent work proposed a technique that placed a shield between the agent and environment that blocks any undesired actions from being taken. An advantage of these shields is they can be formally verified and implemented in both the learning and deployment phases. However, these shields require expert domain knowledge and are tedious to configure and implement. For example, consider the number of unique cases one must consider when creating formal specifications for a self-driving car. In this work, we propose a technique that uses the kinematic and dynamic models of existing systems to automatically create shields that block not only unsafe actions but also the physically infeasible actions. The proposed technique is then applied to two different self-driving car environments, and the tradeoffs analyzed. We find that we can decrease training times by up to 57.2%, reduce the number of fatal collisions in simple environments by 98.0% and complex environments by 59.4%, and reduce the physical stress placed on the vehicle using our automatically generated shields.
- Matthew Dwyer (Chair)
- Sebastian Elbaum (Advisor)
- Nicola Bezzo
- Madhur Behl