Return to Index
Operations Research Models and Methods
 
Computation Section
Subunit Markov Decision Process

A Markov Decision Process (MDP) add-in adds actions to Markov analysis . At each state there are two or more decision options that affect the costs and transition probabilities of the Markov chain. Each state has an optimum decision. The optimum decision minimizes the discounted cost of operation, given that the system is currently in that state. The collection of the decisions for all state determines a policy. The system will operate the system in the least costly way if the operator chooses decisions indicated by the optimum policy.

The optimum policy is determined by a dynamic programming algorithm. The data for the problem is entirely described on the worksheet created by the add-in. Most of the computations are performed by equations directly on the worksheet. An iterative process performed by a VBA program implements the dynamic programming recursive procedure.



To illustrate the MDP we use an example similar to the one used in the description of the Markov chain model. Now however, we will have two maintenance options.

We are concerned about the maintenance policy for light bulbs in a large lighted sign. The manager wants a policy that produces the smallest discounted net present worth for operating the sign.

The manager can inspect each bulb at the end of every month of its life at a cost of $0.50 per inspection. If the bulb is found to be failed, it is replaced at the beginning of the net month for a cost of $2.

Alternatively, the manager may simply replace the bulb without inspection based on its age. Non-failed bulbs may be discard by this process, but the price of a new bulb is reduced to $0.30 if it is replaced by this alternative maintenance procedure. The cost reduction is due to the larger number of bulbs that will be purchased, thus obtaining an economy of scale. The $0.50 inspection cost is not expended with the replace option.

 

The Model with Inspection

With the inspection option, the Markov chain transition matrix is shown below. At the end of each month, the bulb is inspected. If the bulb is failed it is replaced at the beginning of the next month and the system moves to the New state. If the bulb survives, it becomes one month older. The probabilities show the effects of aging. The new bulb exhibits the "infant mortality" characteristic. Perhaps due to manufacturing defects the probability that a new bulb fails during its first month is greater than the probability that a one month old bulb fails during its second month of life. As the bulb ages further, the probability of failure grows. In the fourth month, the likelihood of failure reaches its highest value. The bulb may survive beyond four months, but the probability of failure given that it survives after the fourth month remains constant.


 

Each row of the matrix describes what might occur for a bulb of a particular age. For example, the row labeled New indicates that if a new bulb is inserted, the probability that it will fail during the month and be replaced with a new bulb in the next month is 0.4. The probability that it will not fail and survive to an age of one month is also 0.6. The remaining states, 1-month, 2-month, 3-month, and 4-month, indicate the failure and survival probabilities of different aged bulbs. The 4-month state represents ages 4 months or older.

The economics of the situation are shown in the Economic Data worksheet. At each age the bulb is inspected at the cost of $0.50. The cost of a new bulb, $2, is assigned to state New. Every time the system enters the New state a bulb is purchased.

  The steady state analysis reveals interesting information regarding the bulb. Rounding values to the nearest whole percentages we see that a new bulb is purchased in 43% of the months. The average cost of maintaining the bulb is $1.37 per month. Starting from the New state, the discounted cost (net present worth) using the interest rate of 5% per month is $29.61. The discounted cost is important in this context because it is the measure that will be minimized with the MDP model.
 

 

An Alternative to Inspection

 

Rather than inspect each month, we could adopt a policy of replacing the bulb without inspection. This is the replace decision. For example, if we adopt the replace option for the new state, the bulb will be replaced at the end of the first month without inspection. A working bulb may be discarded by the replace option, but the inspection cost is not incurred. Also we assume that new bulbs will cost $0.30 less with the replace option, resulting a unit cost of $1.70.

With the replace option, the system moves immediately to the new-bulb state. A row in the transition matrix will have a single 1 in the first column. All other entries will be zero.

 

The Markov Decision Model

 

To incorporate decisions into the model we select Build Model from the menu or click the corresponding button on the Start worksheet.

The dialog below is presented.

The fields on the dialog accept the name, number of states, number of decisions and number of decisions per state. The Random Problem checkbox places random data on the worksheet. This is useful for illustrating the add-in without extensive data entry. When the Maximize Objective checkbox is checked, the add-in constructs a maximization model. Otherwise the objective is minimized. For the current example, there are five states with two decisions per state. The goal is to minimize the expect discounted cost.

We illustrate the MDP worksheet with several figures showing selected rows and columns. The cells of the model are colored yellow, green and white. Only the white cells hold data that can be directly controlled by the user. The yellow cells hold formulas and values that should not be replaced. The green cells hold values computed by the VBA code. In the following we focus on the white cells because these hold data describing the model.

 
The upper left cells show the problem parameters and intermediate results. The bottom three fields can be changed directly by the user. The values are reflected on the worksheet.
 

Columns to the right of the parameters show the state and decision definitions. The three buttons at the top-left control the iterations of the MDP algorithm. The Make Markov Chain button constructs a DTMC model with the current decisions determining the transition matrix and economic parameters. The Change MDP allows changes in some features of the MDP model.

Column E holds the state names assigned by the user. The State Cost column holds the cost of being in each state. The value of 2 in F12 is the cost of the new bulb. Column G holds the current decision index. The green color indicates that it is computed by the algorithm. Column H holds the corresponding decision name. The State Value is computed by a formula which evaluates the minimum cost decision associated with each state. Columns H and I are yellow to indicate that they hold Excel formulas and should not be changed. When the worksheet is first constructed, the State Dec. column is not related to the State Value column. Once the algorithm begins the State Dec. column shows the decision that results in the State Value.

 

Columns L and M define the decisions and their associated costs. The Inspect decision expends the cost of inspection, $0.50. The Replace decision results in the cost savings for the replace option. An additional Null decision is provided and may be assigned as an option for some states. The cost has a large value so this option will be not be chosen for a minimization.

The rows on the worksheet beginning in column O and progressing to the right show data and computed results describing state and decision combinations. Each row shows a state and a decision. For example, row 12 describes state 1 (New) and decision 1 (Inspect). Column R shows the names of the state/decision combination. Columns S through U hold computed values necessary for the algorithm. Columns V through Z hold transition probabilities. These are important data inputs for any MDP. For row 12 the transition probabilities are the same as used for with the inspection option model described above. Row 13 holds the transition probabilities associated with the replace option. This option always brings the system to the New state.

 

Observe that each state has two decisions, to inspect or replace. The transition probabilities for the inspect option are the same as in the Markov chain model at the beginning of this page. The transition probabilities for the replace option always have a single value of 1 for the New state and 0 values for the other states. The last column (AA in this case) is the sum of the transition probabilities for each row. The sum of transition probabilities should be 1 for a valid model.

Column S computes the sum of the state and decision costs. Column T holds the expected cost of the next values given the current state and decision. Column U holds the sum of column S and and the discounted value of column T.

The data for this portion of the model can be entered in any order, but the add-in sorts the data in the order of the state indices. This makes it easier to enter the data for large problems.

 

Dynamic Programming Iterations

  The optimum decisions and the corresponding expected state values are determined by a dynamic programming algorithm. To describe the process we show a series of results obtained for the example problem. The tables are a slightly revised version of the worksheet so that the results are concisely presented. To start the process the next values are assumed to be 0 as shown in the green range at the top. The most important column is the one labeled State/Dec. Value. This is column U for the example. The column labeled State Value computes the minimum State/Dec. Value for each state. Thus we see that for state 1, the minimum cost decision has the value 1.7. Although not explicitly shown, the decision leading to that value is the Replace option. Similarly the state values for each state are computed as the minimum of the values for the decisions associated with each state. The First Step button at the top of the page sets the Next Values to zero.
 

Clicking the Next Step button performs an iteration. The iteration process copies the state values and pastes them into the next value range. The cut/paste operation is accomplished by the VBA code of the add-in. The results are shown below. The minimum for state 1 (the New state) is now obtained with the Inspect decision.

  A second click of the Next Step button repeats the copy/paste operation to obtain the results below. As the iteration steps increase, the values in the next value range approach the values in the state value range. Each step represents an advancement in time of one month.
 

Clicking the Many Steps button at the top of the page presents a dialog accepting an integer number of steps.

After 100 steps the results of the example are below. The next values are converging to the state values. It can be shown that this result will always be true for a positive interest rate.

  The upper left portion of the worksheet shows the number of steps and the maximum difference between the state values and the next values. The optimum decisions are shown in columns G and H. The optimum decision is to inspect the new and 1-month old bulbs. A 2-month old bulb is replaced without inspection at the end of the month. This is equivalent to replacing the bulb automatically when it is three months old. The replace decision is also optimum for 3 and 4 month bulbs. We will see however, that these states can never be reached if the bulb is replaced at the end of its second month of life.
 

 

Changing the MDP

 

The Change MDP presents the dialog below. States, decisions, and state-decision combinations can be added or deleted from the model. A requirement of the model is that the number of state-decision combinations remain two or greater for each state. For states that allow only one decision, the Null decision can be used as the second option. More complex problems can have more decisions and more decisions per state.

 

Make Markov Chain

  The Make Markov Chain button creates a DTMC model using the current decisions. The transition matrix for the example, is shown below. For the optimum policy, the 3 and 4-month states are transient. State names, time period and transition probabilities are transferred to the matrix worksheet of the DTMC model.
  The economic measure (Cost), discount rate and state costs are transferred to the DTMC on the Economic Data worksheet.
  The steady-state results for this model can be compared with the results for the inspection only results. New bulbs are purchased in almost 50% of the months as compared to about 43% with inspection. The Expected NPW is smaller for every state using the optimum policy than the inspect-always policy. This should be the case because the optimum policy minimizes this measure for every state.

Steady-State Analysis for the optimum MDP policy

Steady-State Analysis when inspections are performed in each state

 

Usefulness of MDP

 

A number of interesting decision problems can be addressed using MDP analysis. Because the model is entirely constructed on a single Excel worksheet, the model size is limited by the size of the worksheet. The number of rows of the model is equal to the number of states multiplied by the number of decisions and the number of columns is equal to the number of states. Practical MDP models may be very large with respect to the number of states and therefore the number of rows and columns. For this add-in, the model size is limited by the rows and columns of the Excel worksheet. Excel 2007 allows quite large models, but certainly very large problems would be better handled by stand-alone programs.

 

  
Return to Top

tree roots

Operations Research Models and Methods
Internet
by Paul A. Jensen
Copyright 2004 - All rights reserved

Next Page