Be Careful when Learning to Price: A Theory of Learning from Strategically Revealed Preferences
The problem of learning to price is ubiquitous in the widely spreading e-commerce today. Recent works show that a seller can effectively learn the optimal price from the revealed preferences of a buyer (i.e., optimal responses to seller prices), but assume that the buyer will always honestly reveal his true preferences. However, in reality, the buyer has incentives to mislead the seller’s learning algorithm by manipulating his revealed preferences. This paper studies price learning from such a strategic buyer and seeks to understand: (1) how the buyer will strategically respond to learning algorithms; (2) what the seller can expect to learn; and (3) how such strategically revealed preferences affect both agents’ objectives.
In a general model of learning through repeatedly interacting with a strategic buyer, we show that the optimal buyer response strategy is simply to imitate a different and carefully crafted value function (which we will characterize) consistently throughout the learning process. We also characterize both agents’ utilities in this strategic setup. When the seller has a convex production cost function as widely assumed in economics, the seller utility will be its Bregman divergence between no production and a certain production amount. Our results hinge on answers to a fundamental question which we term first-order concave fitting: given any two vector sequences x_1, ..., x_T and p_1, ..., p_T in R^d, when is it possible to construct a concave function u(x) such that p_t equals precisely the gradient of u(x) at x_t for all t=1,…, T? We prove that this is possible if and only if the two vector sequences are permutation stable: p_1 * x_1 + ... + p_T * x_T <= p_1 * x’_1 + ... + p_T * x’_T where x’_1, ..., x’_T is any permutation of x_1, ..., x_T.
- David Evans (Chair)
- Haifeng Xu (co-Advisor)
- Michael Albert (co-Advisor)
- Hongning Wang