Uncovering Strategies and Commitment Through Machine Learning System Introspection

Deep neural networks are naturally “black boxes”, offering little insight into how or why they make decisions. These limitations diminish the adoption likelihood of such systems for important tasks and as trusted teammates. We design and employ an introspective method to abstract neural activation patterns into human-interpretable strategies and identify relationships between environmental conditions (why), strategies (how), and performance (result) on a deep reinforcement learning two-dimensional pursuit game application. For example, we found that activation patterns that were abstracted into “head-on” or “L-shaped” maneuver strategies were successful and intuitively corresponded to favorable initial conditions. Moreover, we characterize machine commitment by the introduction of a novel measure based on analysis of time-series neural activation patterns over the course of a game, and reveal significant correlations between machine commitment and performance. By uncovering temporally-dependent machine “thought processes” and commitment through introspection, we contribute to the larger explainable artificial intelligence initiative, increasing transparency and trust in machine learning systems.

Utgiver

Springer

Tidsskrift

SN Computer Science