Log Gambler Model Solution

 Dynamic Programming Examples - Log Gambler: Model

This example comes from a book Introduction to Stochastic Programming by Sheldon M. Ross, (Academic Press, 1982). In the book the problem is described with a continuous nonnegative state unbounded from above. The action variable is also continuous. For our purposes we will discretize both state and action.

 A gambler is playing a game that has two results, win or lose. He can bet any amount up to his entire wealth. If he wins the amount of the bet is added to his wealth, and if he loses the amount of the bet is subtracted from his wealth. The gambler's goal is to maximize the log of his wealth after a fixed number of plays.

To make analysis possible, we define the gambler's wealth as the state variable measured in whole dollars. We restrict the gambler's wealth to be less than the quantity M. Bets are measured in dollars and in this model the amount of the bet is the action variable. Bets are restricted so that the gambler can never bet an amount that would reduce his wealth to less than \$1 or more than M. There are N plays in the game and the objective is to maximize the log of the wealth after play N.

States

The states are defined with a single state variable. The state definition region on the model worksheet is shown below. Cell F14 holds the value of the current state, 10. For the example we choose to limit the wealth to \$20. The lower bound on the state is in F18, and the upper bound is in F19.

The example illustrates a new feature of the DP Models add-in, the ability to define the values for the final state. This is important for finite time horizon problems. For the Gambler problem, the only objective concerns the value at the final state. The goal is to maximize the log of the wealth at the end of the time horizon, 20 plays for the example shown above.

After enumeration the state list appears as below. The Final Dollars column is computed as ln(final wealth).

Actions

The action is the amount the gambler bets on a given play. The range of bets is defined in the action element. The goal of the DP is to select bets at each state to maximize the log of the wealth at the end of the game.

Events

The event indicates whether the bet wins or loses. The probability of winning is a parameter of the system, stored in Q24. The example assumes the probability of winning is 0.6 with a corresponding probability of losing of 0.4. The summary starting in column S shows that the combination of elements, S(10)/Bet(5)/Win, is feasible.

Decisions

Not all combinations of state and action are admissible. We reject all actions that might lead to a state less than 1 or more than 20. This is accomplished in the single decision block of the model. The Max entry in column K holds a formula that limits the size of the bet. The maximum bet with a wealth of 10 is a bet of 9. any larger bet allows the possibility that the wealth falls below 1 or above 20.
Enumeration of the elements reveals that there are 110 feasible combinations of state and action. Near the top of the list the actions are limited by the risk of falling below 1. Near the bottom of the list the actions are limited by the possibility of falling above the maximum wealth of 20.

.

Transitions

The transition equation is particularly simple for this model. The change state cell, F64, contains a formula that evaluates the positive of the bet for a win event, and evaluates as the negative of the bet for the lose event. The example shows that betting 5 when the wealth is 10 increases the wealth to 15 because of the win event.
There are two events for every decision so the transition list has 220 entries.

.

The gambler problem is solved on the next page.

Operations Research Models and Methods
Internet
by Paul A. Jensen