Simulating Dynamic Pricing Algorithm Performance in Heterogeneous Markets

This thesis investigates how sellers in e-commerce can maximize revenue by utilizing dynamic pricing decisions by machine learning algorithms in a market consisting of multiple learning agents and a heterogeneous consumer base. Employing a novel approach we elucidate how a population of agents adapt and perform in the presence of other adaptive agents when faced with a mixed composition of myopic and strategic consumers in a finite market.

By analyzing similarities and variation in current research, we find that a variety of different machine learning approaches have been applied to dynamic pricing problems, but there seems to be no unifying best algorithm for solving these complex problems. Furthermore, we find that all the presented literature evaluates the performance of machine learning algorithms in simple simulated environments. Also, we shed light on the literature's failure to compare the performance of different machine learning approaches under equal conditions. Perhaps our most important discovery is that there seems to be no empirical evidence justifying the added value of dynamic pricing by machine learning algorithms for real-world sellers.

Our approach studies dynamic pricing decisions by simultaneously learning, Q-learning and neural network algorithms, and find that despite the non-stationary nature, the algorithms provide robust performance in our moderately realistic markets. Furthermore, the agents show tendencies to collude implicitly, keeping average prices above marginal cost without any means of communication. Neither of the algorithms proves to be the overall best achiever, as their performance depends on the underlying market's consumer and seller composition, and the extent to which the agents are trained. We find that Q-learning presents the most reliable approximation of optimal future price paths, but at a cost of a long and tedious training period. The suggested neural networks show promising results and provide a more balanced approach to training time and performance. However, their limited network size precludes them to comprehend the consequences of their actions, and consequently, show less implicit cooperation than Q-learning.

We find that the agents' ability to capitalize on fluctuations in consumer valuations, not only improves the sellers' profitability but may also increase consumer surplus, compared to a fixed-price policy. However, it seems that the dynamics between the sellers' price policies have the greatest impact on revenues, as the algorithms are collectively incapable of exploiting increases in consumer valuations. Furthermore, in the presence of strategic and myopic consumers, lower price paths may be beneficial for maximizing revenues, depending on the discrepancy between the consumers' willingness to pay. Consequently, we find that competition can increase output and seller surplus because it induces an earlier lower price path. In a monopoly, the algorithms' more gradually learn that a lower price path can generate more revenue.

Utgiver

NTNU