Dynamic Programming Examples - Baseball Model

This example comes from a book Dynamic Programming and Markov Processes by Ronald A. Howard, (MIT Press, Cambridge, Massachusetts, 1960). It is interesting because it describes a decision process familiar to baseball fans. It embodies difficult transition relations that reflect the rules of baseball. The process describes a single inning of the game. The states describe the number of outs and the occupation of the bases. The process starts with no outs and no-one on base. Decisions are instructions from a coach to the current batter and base runners. The events are the result of the play. The events are probabilistic. The inning passes through a series of states and finally finishes with a state with three outs.

The model is constructed directly by the DP Models add-in and solved with the DP Solver add-in.

Creating the Model Form

The model is constructed by choosing Add Model from the DP Models add-in menu. The baseball problem is a stochastic dynamic program.

The next dialog accepts data describing the form of the model. Once the model is constructed, the Name, and the numbers of State Variables, Action Variables and Event Variables are fixed. The numbers of State Blocks, Decision Blocks and Transition Blocks may be changed during the modeling process.

On clicking OK, the entire worksheet is prepared. You should download and open the baseball workbook to observe the various parts discussed below.

The upper-left corner of the worksheet is shown below. The parameters describing the form are in columns A and B. The white fields in rows 11 through 14 can be changed. The yellow cells should not be changed. The Change button in column D allows changes in the numbers of blocks of the various types.

The other buttons cause the states and other model elements to be listed on a worksheet labeled BB, Lists. The Transfer to DP Solver creates a solver model and translates the data to the solver.

States

 The state definition is the basis for all models. The enumeration form on the left shows the state definition for the baseball problem. There are four state variables. The first indicates the number of outs.The other three are the numbers of players on third, second and first base respectively. The lower and upper bounds in rows 18 and 19 determine the range of the state variables. The last three variables are binary, indicating that the base is either occupied or not. Cell E17 holds a logical expression that restricts the states to be considered. For the baseball problem, no logical conditions are specified in this model, so cell E17 holds the default value TRUE. The state space consists of all integer values within the ranges. In this case there are 32 states in the state space. Row 20 computes the name of the state variable and row 21 computes the component of the objective value that depends on the state. The objective of this problem is to maximize the number of runs scored in the inning. The runs always occur because of transitions, so the state does not affect the objective directly. Row 21 has all zero entries.

The add-in enumerates all states by advancing the index in E14 through all integer values from 0 to 31. We will call this the state count index. Formulas in row 14 use the MOD function to compute the state values associated with the index. The current state is in the range F14:I14. During the modeling process it is often useful to manually change the state. This is accomplished by entering a nonnegative integer in cell E14 or clicking on the spinner control in column K. Clicking the top arrow advances the state index by 1 and clicking the bottom arrow reduces the state count index by 1.

In addition to the state vector enumeration structure a collection of state subsets can be defined. The example uses two state subsets. The current state is repeated in row 33 in columns F through I for easy reference. Summary information is shown at the right. Some columns of the worksheet are hidden to simplify viewing.
State subset 1, rows 36 through 41, identifies all the states with three outs. The lower and upper bounds in F39 and F40 are both equal to 3. With three outs, the inning terminates. The Exit Model cell in X39 indicates a terminal state. For this subset, the model has the value TRUE in X39 to indicate that the process terminates when one of the states with three outs is reached. Cell X37 indicates whether the current state is in this subset, and since the current state has 0 outs, the value FALSE is returned.
State subset 2, rows 42 through 47, identifies all states with 2 or fewer outs. The current state (0, 0, 0, 0) satisfies this condition so the result is TRUE in cell X43. The state summary at the top of the figure indicates that the current state fulfills one of the state subset conditions (X33). The number in X34 shows the subset index that contains the current state. When a state falls into more than one subset, the nearest subset to the top is reported.

 Clicking the List States button at the top of the page, creates the list to the left. States listed must satisfy the enumeration bounds and be in at least one of the state subsets. All combinations of outs and base runners are listed. The Final state at the bottom of the list is added to accept terminating transitions. When determining the transitions for a model, a transition is accepted only when it leads to a feasible state. The list at the left shows all feasible states. This list and all the other lists described on this page on the worksheet named BB Lists.

Actions

 The action for the baseball problem is the advice given to the batter or base runner. A batter may instructed to hit or bunt. The base runner can be asked to steal second, third or home base. The actions are enumerated with the form at the left. The name in N20 is provided by a lookup formula referring to the data table, described below. An important restriction of this program is that the actions and events for the model must be defined independent of the state. Although all states do not allow all actions, the action list must include all actions possible under any state,

 Clicking the List Elements button at the top of the page creates the Action List. The table shows an index that varies from 1 to the number of states, the count value that leads to each action, the name, the action variables (only Bat for this problem), and the number of runs contributed by the action. No runs are caused directly by the action, so this column is 0.

Events

 The baseball problem allows 14 events. The name and probability entries on the enumeration form are obtained by reference to the data table. Again the events must be enumerable without reference to the states and actions. The objective contribution and the event probabilities may depend on the state and action. In this case probability depends on the action chosen.

 The complete event list is shown to the right. 1B, 2B, 3B and HR indicate possible hits that a batter can make to advance the runners and possibly score runs. BB is a base on balls. SO, FO and GO describe kinds of outs: strike out, fly out and ground out. SAC is a sacrifice that advances the runner, but the batter is out. FC means fielder's choice. A fielder's choice causes the lead runner to be out, but the batter takes first base. ST-Suc means a successful steal, ST-Fail means that the steal fails and the runner is out. ST-Stay means that the runner neither advances nor is out and remains on the base. Complex transitions occur from one state to another based approximately on the rules of baseball. These are described in the Transitions section of the worksheet.

Data Table

 The table to the left is constructed to hold names and probability data. This is the same data used by Howard in his description of the baseball problem. The model has equations that point to this data, so it is easy to generate models for alternative data. Notice that the probabilities depend both on the action and event.

Decisions

A decision is a combination of a state and an action. The Decision Blocks of the worksheet define the set of decisions for the dynamic programming model. The current decision shown at the top of the display below. The current decision combines the state (0, 0, 0, 0) and the action Hit. At the right of the display we see the name of the decision S_0_0_0_0/Hit in X50. Cell X51 indicates that this decision is feasible (TRUE), and X52 indicates that subset 5 of the decision subsets is feasible. Notice block 5 starts in row 78 and represents all states with 0 to 2 outs and the decision Hit. Each block defines a subset of the possible decisions. For example, block 1 starting in row 54 holds the states that allow the action Steal 2. These states all have 2 or fewer outs, second base is empty, and first base is occupied. In like manner, I have represented all feasible decision subsets.

When generating the model for dynamic programming, the add-in will enumerate all combinations of the state and action lists. If a decision is feasible (at least one of the blocks is satisfied) it is added to the decision list. If a state has no feasible decisions, the NA action is assigned to the decision and the combination penalized with a large cost.

For this example all state and action costs are zero, so we see zero values in the Runs cell of column X.

In general, several actions may be feasible for a state and the collection of all feasible decisions forms the decision list. Part of the decision list for the example is shown below. There are 78 feasible decisions, so only the first few and last few decisions are shown. For each decision there is a state index and an action index. Notice that there is only decision for state (0, 0, 0, 0) and that is the Hit action. Three decisions are available in state (0, 0, 0, 1), no outs and a man on first base, and the associated actions are Hit, Bunt and Steal 2. The decision blocks the resultant decision list depend on assumptions regarding appropriate actions for certain states. For example, when no one is on base, the Hit action is the only possibility,

We see at the end of the list, the states with three outs. The Null action with 0 cost is appended for states that go directly the final state.

Transitions

Transitions show the movement from one state to the next, and is governed by the state, action and event. Some of the transition blocks are shown below. There are 26 blocks in my model for the baseball problem. Each block describes a subset of the combinations of states, actions and events. It is useful to divide the transitions into subsets because it is impossible to write a single transition function that conveys the complex logic of the baseball inning. Transitions describe the function that leads the system from one state to another. Each transition has an objective term and a probability.

The current state, action, and event combination is given at the top of the display. The state is (0, 0, 0, 0), the bases are empty and there are no outs. The action is to Hit. The event is 1B, a one base hit. The name is in X92. X93 indicates that some transition is feasible, and X94 shows the index of the transition subset.

Block 1 describes a one base hit. The ranges on the state variables in rows 99 and 100 show that this transition can occur in any state with two or fewer outs. The logic and bounds in columns M, N and O indicate that the action can be either hit or bunt. Columns R, S and T show that only the 1B event is modeled by this block.

The transition equation is given in the range F112: I112, labeled Change State. The next state is in the range labeled Next State. It is the sum of the current state vector and the change state vector. The change state vector for this case is (0, 0, 0, 1) and the new state is (0, 0, 0, 1). A base hit with no-one on base and no outs results in a runner at first and no outs. The equations in the Change State range are fairly complicated. They must represent the change in state regardless of the pattern of runners on the bases. For the !B event, all runners advance and the batter goes to first base. When third base is occupied, a run scores. The Run cell in K102 has a formula that evaluates to 1 when third base is occupied. This value is transferred to the summary column with a formula. The probability, 0.15, comes directly from the event probability data.

The formulas in the change state vector compute the next state. For feasibility, the next state must appear in the state list. When it does, the index number of the state is in cell J104, The index is transferred to X100.

to generate the complete list of transitions, the program enumerates all combinations of decisions (states and actions) and events. The result is the Transition List shown in part below. Each transition has an objective function contribution, shown as the Transition Runs column AH. The Transition Probabilities are in column AJ. Only transitions with nonzero probabilities are listed. The next state is described by the index in AJ and by name in AK. There are 402 transitions for the baseball problem.

There are 26 transition blocks. Different models are possible but we have attempted to divide the transitions to make the transition functions as simple as possible. Still, in some cases the logic is complex.

The Complete DP Model

The State, Action, Event, Decision and Transition lists comprise the complete DP model for this problem. Clicking the Transfer to DP Solver button creates the lists. builds a DP Solver form, and transfers the model lists to the DP Solver form. The DP Solver model for this problem is on the next page.

Building and Debugging the Model

The form for a model is easily constructed with the Add Model from the DP Models add-in menu. The structure dialog accepts the numbers of state, action and event variables. Once the model is constructed these values are fixed. The dialog also specifies the numbers of state, decision and transition blocks. The numbers of blocks can be changed through the Change button, however, there must be at least one block created in the initial model in order to add additional ones.

The hard part of the baseball model was creating the transition equations. Rules for advancing runners in the event of a hit and adjusting runners for a fielder's choice or double play require fairly complex logical expressions in the Change State region of the transition definitions. While debugging the model it is useful to create two panes on the worksheet. Then the spinners on the element definitions can be easily changed to verify the equations in the various blocks. Careful review of the decision and transition lists may reveal errors.

Operations Research Models and Methods
Internet
by Paul A. Jensen