DP Solver Example Build Model Value/Fixed Value Iterations Policy Iterations Linear Programming Worksheet Details

 Dynamic Programming Solver Worksheet Details

During the discussion of the DP Solver features we have deliberately hidden worksheet columns to simplify descriptions. These columns contain a variety of formulas for implementing the solver functions. The student using the add-ins need not be concerned about the computational columns except to follow the rule to leave these columns alone. The DP Solver worksheet has a button labeled Equations. The purpose of this button is to rewrite the formulas after changes have been made.

There is a difference between worksheets constructed through the DP Models add-in and those constructed using the Build Model command on the DP Solver menu. The worksheets created by the DP Models add-in colors fewer columns yellow because some of the computations are accomplished during the model building. The Build Model command fills some of the columns with formulas to simplify building small problems. In some limited cases, described on this page, these yellow columns can be changed (with care).

The worksheet depends on the problem being solved and on the option selected for the display of data. The problem being solved can be a discrete time Markov Chain, a MDP problem or a deterministic DP model. The example on this page shows the MDP problem. The data option on this page shows the transition probabilities using a matrix format. The matrix format allows the transition costs to be shown as a matrix or represented by a column of expected transition costs. The latter format is illustrated here.

The DP Model

 We illustrate the DP worksheet with several figures. The cells of the model are colored yellow, green or white. Only the white cells hold data that can be directly controlled by the user. The yellow cells hold formulas and values that should not be replaced. The green cells hold values computed by the VBA code. The upper left cells on the page show the problem parameters and intermediate results. The content of the green fields are filled by the solution algorithms. The bottom three fields can be changed directly by the user. The values are reflected on the worksheet. Notice that this example uses a discount rate of 1% per month.

Columns to the right of the parameters show the state, action, and event definitions. The state definition is shown below with the solver buttons. Columns D through N have state information. Each row, starting in row 14 holds the information for a single state. Column D holds the state indices, consecutive integers from 1 through the number of states. Column E holds the state names assigned by the user. The State Cost column holds the direct cost of residing in each state. The value of 2 in F14 is the cost of the new bulb. It is incurred whenever the system is in the New state.

As indicated by their yellow color, columns G through M hold quantities computed with Excel formulas. They should not be disturbed. Column G holds the current decision index. The index points to a row in the Decision table of the worksheet that is discussed below. Column H indicates the bottom of the range of the Decision table that contains the decisions for the state. Column I holds the action indices of the current policy. Column J holds the corresponding action names. Column K holds the value of the decision taken from the Decision table. The Step Values in column L are the immediate costs including state, action, and expected transition costs. The State Value in column M is the step cost plus the expected NPW of costs of the remaining steps in the time horizon. Column N holds the probabilities computed for the states. It is green to indicate that the computer fills this column using an algorithm rather than a formula. The figure shows the initial values before any optimization. The goal of the solver algorithm is to find the policy that minimizes the values in column M.

As the solution algorithm progresses, the computer automatically adjusts the contents of the state table. No interaction is required by the user except to select the iteration method and initial conditions through the Solve dialog.

Columns P through W hold action and event information. The Inspect action expends the cost of inspection, \$0.50. The Replace action results in the cost savings of \$0.30. An additional Null action is provided and may be assigned as an option for some states. The cost of the Null action is zero. The NA is assigned when an action for some state is unavailable. The cost for the NA is very large. Except for the names, the event information plays no role for this example. The probabilities for events are provided in the transition probability table.

Decision List

The rows on the worksheet beginning in column Y and progressing to the right show the decision list. A decision is a combination of state and action. The table shows the data and computed results describing states and actions. Each row shows one state and one action. For example, row 14 describes state 1 (New) and action 1 (Inspect). Column Y holds the indices by which the rows are addressed. Column Z and AA show the state and action indices respectively.

Column AB shows the names of the state/action combinations. Columns AC through AG hold computed values necessary for the algorithm. Column AC shows the action costs. Column AD holds the expected cost of the transition. When an action/reward matrix is provided, this column is computed. Otherwise, as in this example, it is entered as data. Column AE holds the expected next value cost. It is computed as the Next Value in row 8 multiplied by the row of the transition probability matrix. Column AF holds the decision value computed as the sum of the decision reward, the expected transition reward and the discounted future value. Column AG holds the probability of the decision. It is the Last Probability for the state associated with the decision when the action is optimum for the decision.

Observe that each state has two actions, to inspect or replace. Columns AH through AL hold transition probabilities. These are important data inputs for any stochastic MDP. Even-numbered rows from 14 through 22 hold the transition probabilities associated with the inspection option. Odd numbered rows from 15 through 23 hold the transition probabilities associated with the replace option. This option always brings the system to the New state. The last column (AM in this case) holds the sum of the transition probabilities for each row. The sum of transition probabilities must be 1 for each row and all probabilities must be nonnegative.

The green range in row 8 holds the next-state value computed by the iterative solution process. The yellow range in row 9 holds formulas that compute the state probabilities. The two cells at the bottom of column AM compute the minimum and maximum values of the probability sums. When these numbers are different than 1, the probabilities are in error and a solution algorithm will not continue. The Minimum Probability cell holds the minimum value of the all the numbers entered in the probability range. If this number is negative, a negative probability has been entered. This is not allowed.

Operations Research Models and Methods
Internet
by Paul A. Jensen