Dynamic Programming Models - Models: States/State Subsets
 The states of the system are described by the following notation. In our models there is a finite number of states and each state is represented by a vector of n elements called state variables. Each state variable is discrete and has a finite number of possibilities. In all our examples the state variables are integer. The State Element The model is described on a range of the models worksheet, called the state element. It is illustrated for a particular case for the states of the Baseball problem. Each state variable has a column of cells in the state element. For the example there are four state variables. The first state variable in column F indicates the number of outs in the inning. The remaining three state variables, B1, B2 and B3, indicate the status of the bases where each base has a runner or not. In the example, third base is in column G, second in column H and first base in column I. Lower and upper bounds on the state variables are in rows 18 and 19 respectively. The interval between one state variable value and the next is the Step size in cell E15. A step of 1 means that each state variable takes on consecutive integer values between the lower and upper bounds. The bases are listed in the order shown because it more convenient for reviewing the lists produced by the enumeration. It is the same order used by Howard who originally proposed the problem. The value of the state variables are in the range (F14:I14). The figure shows the state (0,0,0,0). In terms of baseball, this is the situation with no outs and the bases empty. Rows 15 and 16 hold formulas that are used by the enumeration process. Cell E14 holds an integer value that corresponds to a particular state. The value 0 indicates (0,0,0,0). The value 31 indicates (3,1,1,1). The number of state vectors is equal to the number in E16, 32 in this case. The complete set of states is enumerated by allowing the number in E14 to range from 0 to 31. The spinner control in column K changes the number in E14 by integer amounts. This is useful for manually debugging the model. The conditions for a feasible state are in row 17. Cells in the range (F17:I17) hold logical expressions that return TRUE if the state variable is within its lower and upper bounds. Cell E17 has a default value of TRUE, but a user defined logical expression can be inserted in cell E17 that imposes additional conditions on feasibility. Cell J17 indicates the feasibility of the current state. It returns TRUE if all cells in the range (E17:I17) are TRUE. Otherwise it returns FALSE and the state is not feasible. Rows 18 and 19 hold lower and upper bounds on the state variables. All the examples use integer values of the bounds. Row 20 holds string formulas and constants that define the name assigned to the states. Cell J20 is the specific name for the state. In the figure it is partially hidden because the width for column J is too small, but the value is S_0_0_0_0. Cells F20 through I20 hold formulas that combine the underline symbol with the state variable value. Cell J20 concatenates the values in the range (E20:I20). Names are useful for interpreting and debugging the model. In later versions of the DP Models add-in, the name uses commas rather than the underline symbol and begins and ends the name with parentheses. The example state has the name (0,0,0,0). Cell J21 holds the state contribution to the objective function. Initially the cell holds a formula that is a linear expression of the cell values plus a term computed in cell E21. The coefficients of the linear expression are in the range (F21:I21). The default values are 0. Cell E21 also is initially 0. Cell J21 is the sumproduct of the ranges (F20:I20) and (F14:I14) plus the value in E21. The contents of (E21:I21) and the formula in J21 can be manipulated by the user to represent any single valued function of the state variables. Dynamic programming places no restrictions on the form of the objective function components except that they be computable, well defined and single valued. This flexibility is one the strengths of the dynamic programming model. For the baseball problem, the objective is to maximize the number of runs scored in the inning. Thus the objective row takes the name Runs rather than Cost. We often use the term cost to indicate the general objective function terms, but most problems have a specific measure that is more relevant. The red arrows in the figure indicate functional dependence. The feasibility, name and cost calculated in column J all depend functionally on the state vector. Entries in the other columns are provided to accept terms that are combined, usually by addition, to yield the result in column J. The values in column J and the state specified in row 14 are used in other parts of the worksheet. State Subsets For many situations it is difficult to completely define the features of the states with the state element alone. Some states may be infeasible and the objective value may be a complicated function of the state variables that differs between different sets of states. To simplify modeling, we provide the option of state subsets. State subsets allow modifications to the feasibility and objective function values computed in the state element. They also allow states to be identified as exit states. Models do not require state subset definitions, but if one or more are created, the collection of subsets must include all feasible states. States left out by the subset definitions are judged infeasible and not included in the state set. The subsets are not required to be mutually exclusive. If a state is included in more than one subset, the one with the smallest index (or nearest the top of the worksheet) is used. The baseball model defines the two state subsets shown below. The subsets begin with a repetition of the current state in the range (F33:I33). The state vector is transferred directly from row 14 of the state element. It is repeated in row 33 for easy reference. When formulas in the state subsets refer to the state, they should reference the values in row 33 rather than row 14. Information in column X of rows 32 through 34 provide summary information regarding the subsets. When X33 is TRUE, the current state falls within one of the subsets. When it is FALSE no subsets hold the current state and it is not included in the model. For a feasible state, the values in both cell J17 and X33 are TRUE. The subsets are defined by bounds on the state variables and a logical condition that must be TRUE for states to be included in the subset. For example, the first state subset represents all states that have three outs, or the value 3 as the first state variable. The lower and upper bounds in cells F33 and F34 are both 3. Any state that does not fall within the bounds is not a member of this subset. The other state variable bounds are 0 and 1, the same bounds as in the state element. So these new bounds are not restrictive. Cell E38 allows a logical expression that must be TRUE for all elements of the subset. The default value is the constant TRUE, but the cell can be changed to represent other logical conditions. Cell J37 is TRUE if all cells in the range (E38:I38) are TRUE. The state (0,0,0,0) does not satisfy the logical conditions for the first subset so this state is not in the subset. Features of the subset are expressed in column X. X35 returns the same value as J37, so it indicates whether the current state is in the subset. Cell X36 has a formula that points to the objective value formula in the state element, cell J21. The formula in X38 can be changed to represent the objective value contribution specific to the subset. When a state falls within the subset, the subset objective value takes precedence. Cell X37 holds a logical constant that when TRUE indicates an exit state. An Exit state completes the sequence of states and there are no actions or transitions to be taken from that state. Cell X38 returns the subset index when the current state is in the subset. Otherwise it returns 9999. Here it is 9999 because the current state does not have three outs. Cell X34 is determined by the minimum of the subset indices. If it is less than 9999, cell X34 indicates the subset with the smallest index. Cell X33 returns TRUE when X34 is less than 9999. The second subset in the example represents all states with 2 or fewer outs. This is expressed by the bounds 0 and 2 in cells F45 and F46. The current state (0,0,0,0) does meet the logical requirements of the second subset, so cell X43 returns TRUE. State (0,0,0,0) is in the second subset. The state subsets are shown below for (3,1,1,1). This state is in the first subset, so it is identified as an Exit state. The objective contribution for an exit state is the value in the objective cell of the associated subset. For the example, the contribution to the objective for states with three outs is in cell X38. That value is zero. Exit states can play a variety of roles in a model. The exit state will indicate an infeasible state if the objective contribution is set to a very high value for a minimization problem or a very low value for a maximization problem. Enumerating the States The DP Models add-in enumerates all states that satisfy the bounds in the state element and creates the list of all feasible states. The list for the baseball problem is below. Thirty two states are identified by the enumeration. For models that have exit states, one additional state called the Final state is added to the model. For the first 32 states, each row designates the index, the count value, name, state variables and objective contribution for a single state. The state indices in column B are used to refer to the state in other parts of the model. The Final state has the greatest index and a non-numerical value for the first state variable. For all states except the Final state, the count value is the unique integer that when placed in cell E14 determines the current state on the model worksheet. The states are sorted according to the state variable values. Other Examples Each model has a state definition that is appropriate to the situation. There are a number of examples within this Dynamic Programming section. Some are shown below. The Doors problem has two state variables representing the coordinate points on a grid. The problem is a random walk with trapping states. The state subsets identify two exit states. The third state subset includes all states including those identified by the first two subsets. When a state falls within one of the two subsets, the results of the highest (smallest index) are used. Note that exit state (0,0) wins a prize of 1, while exit state (3,3) wins no prize. The optimization maximizes the total prize for the random walk so it encourages actions that lead to state (0,0). The Queue MDP has two state variables, the first represents the number of customers in the queuing system and the second represents the number of service channels. No state subsets are required. The model allows the number of servers to vary between 2 and 6. The state cost reflects a cost of 0.06 for each customer waiting or in service and a cost of 0.1 for each server. The time interval is minutes, so both coefficients represent the cost per minute. The objective contribution is a linear function of the state variables and the coefficients in row 21. The Sequencing model describes the oil well sequencing problem. It has states for each well location. The state variables indicate whether the well has been not drilled, drilled and found dry, or drilled and found wet.

Operations Research Models and Methods
Internet
by Paul A. Jensen