|
|
|
To illustrate the MDP we use an example similar
to the one used in the description of the Markov chain model.
Now however, we will have two maintenance options.
We are concerned about
the maintenance policy for light bulbs in a large lighted
sign. The manager wants a policy that produces the
smallest discounted net present worth for operating
the sign.
The manager can inspect each bulb at the end of
every month of its life at a cost of $0.50 per inspection.
If the bulb is found to be failed, it is replaced
at the beginning of the net month for a cost of $2.
Alternatively, the manager may simply replace the
bulb without inspection based on its age. Non-failed
bulbs may be discard by this process, but the price
of a new bulb is reduced to $0.30 if it is replaced
by this alternative maintenance procedure. The cost
reduction is due to the larger number of bulbs that
will be purchased, thus obtaining an economy of scale.
The $0.50 inspection cost is not expended with the
replace option. |

|
The Model with
Inspection |
|
With
the inspection option, the Markov chain transition matrix
is shown below. At the end of each month, the bulb
is inspected. If the bulb is failed it is replaced at the
beginning of the next month and the system moves to the New state.
If the bulb survives, it becomes one month older. The
probabilities show the effects of aging. The new bulb exhibits
the "infant mortality" characteristic. Perhaps due to manufacturing
defects the probability that a new bulb fails during its
first month is greater than the probability that a one month
old bulb fails during its second month of life. As the bulb
ages further, the probability of failure grows. In the fourth
month, the likelihood of failure reaches its highest value.
The bulb may survive beyond four months, but the probability
of failure given that it survives after the fourth month
remains constant.
|
|
| |
Each
row of the matrix describes what might occur for a bulb of
a particular age. For example, the row labeled New indicates
that if a new bulb is inserted, the probability
that it will fail during the month and be replaced with a
new bulb in the next month is 0.4. The probability that
it will not fail and survive to an age of one month is also
0.6. The remaining states, 1-month,
2-month,
3-month, and 4-month, indicate the failure and survival probabilities
of different aged bulbs. The 4-month state represents ages
4 months or older.
The
economics of the situation are shown in the Economic
Data worksheet. At each age the bulb is inspected at
the cost of $0.50. The cost of a new bulb, $2, is assigned
to state New. Every time the system enters the New state
a bulb is purchased. |
|
| |
The steady state analysis
reveals interesting information regarding the bulb. Rounding
values to the nearest whole percentages we see that a new bulb
is purchased in 43% of the months. The average cost of maintaining
the bulb is $1.37 per month. Starting from the New state,
the discounted cost (net present worth) using the interest rate
of 5% per month is $29.61. The discounted cost is important in
this context because it is the measure that will be minimized
with the MDP model. |
| |
|
An Alternative to Inspection |
| |
Rather than inspect
each month, we could adopt a policy of replacing the bulb without
inspection. This is the replace decision. For example,
if we adopt the replace option for the new state, the
bulb will be replaced at the end of the first month without inspection.
A working bulb may be discarded by the replace option, but the
inspection cost is not incurred. Also we assume
that new bulbs will cost $0.30 less with the replace option,
resulting a unit cost of $1.70.
With the replace option, the system moves immediately to
the new-bulb state. A row in the transition matrix
will have a single 1 in the first column. All other entries
will be zero. |
The Markov Decision Model |
| |
To incorporate decisions
into the model we select Build Model from
the menu or click the corresponding button on the Start worksheet.

The dialog below is presented.
The fields on the dialog accept the name, number
of states, number of decisions and number of decisions per
state. The Random Problem checkbox places random data
on the worksheet. This is useful for illustrating the add-in
without extensive data entry. When the Maximize Objective checkbox
is checked, the add-in constructs a maximization model. Otherwise
the objective is minimized. For the current example, there are
five states with two decisions per state. The goal is to minimize
the expect discounted cost.
We illustrate the MDP worksheet with several
figures showing selected rows and columns. The cells of the
model are colored yellow, green and white. Only the white cells
hold data that can be directly controlled by the user. The
yellow cells hold formulas and values that should not be replaced.
The green cells hold values computed by the VBA code. In the
following we focus on the white cells because these hold data
describing the model. |
| |
 |
The upper left cells show the problem parameters
and intermediate results. The bottom three fields can be
changed directly by the user. The values are reflected on
the worksheet. |
|
| |
Columns to the right
of the parameters show the state and decision definitions. The
three buttons at the top-left control the iterations of the MDP
algorithm. The Make Markov Chain button constructs a
DTMC model with the current decisions determining the transition
matrix and economic parameters. The Change
MDP allows
changes in some features of the MDP model.
Column E holds the state names assigned by the user. The State
Cost column holds the cost of being in each state. The value
of 2 in F12 is the cost of the new bulb. Column G holds the current
decision index. The green color indicates that it is computed
by the algorithm. Column H holds the corresponding decision name.
The State Value is computed by a formula which evaluates
the minimum cost decision associated with each state. Columns
H and I are yellow to indicate that they hold Excel formulas
and should not be changed. When the worksheet is first constructed,
the State Dec. column is not related to the State
Value column. Once the algorithm begins the State Dec. column
shows the decision that results in the State Value. |
|
| |
Columns L and M define
the decisions and their associated costs. The Inspect decision
expends the cost of inspection, $0.50. The Replace decision
results in the cost savings for the replace
option. An additional Null decision is provided
and may be assigned as an option for some states. The cost has
a large value so this option will be not be chosen for a minimization.
The rows on the worksheet beginning in column O and progressing
to the right show data and computed results describing state
and decision combinations. Each row shows a state and a decision.
For example, row 12 describes state 1 (New) and decision
1 (Inspect). Column R shows the names of the state/decision
combination. Columns S through U hold computed values necessary
for the algorithm. Columns V through Z hold transition probabilities.
These are important data inputs for any MDP. For row 12 the
transition probabilities are the same as used for
with the inspection option model described above. Row 13 holds
the transition probabilities associated with the replace option.
This option always brings the system to the New state. |
|
| |
Observe that each state
has two decisions, to inspect or replace. The transition probabilities
for the inspect option are the same as in the Markov chain model
at the beginning of this page. The transition probabilities for
the replace option always have a single value of 1 for the New state
and 0 values for the other states. The last column (AA in this
case) is the sum of the transition probabilities for each row.
The sum of transition probabilities should be 1 for a valid model.
Column S computes the sum of the state and decision costs.
Column T holds the expected cost of the next values given the
current state and decision. Column U holds the sum of column
S and and the discounted value of column T.
The data for this portion of the model can be entered in any
order, but the add-in sorts the data in the order of the state
indices. This makes it easier to enter the data for large problems. |
Dynamic Programming Iterations |
| |
The optimum decisions
and the corresponding expected state values are determined by
a dynamic programming algorithm. To describe the process we show
a series of results obtained for the example problem. The tables
are a slightly revised version of the worksheet so that the results
are concisely presented. To start the process the next values
are assumed to be 0 as shown in the green range at the top. The
most important column is the one labeled State/Dec. Value.
This is column U for the example. The column labeled State
Value computes the minimum State/Dec.
Value for each state. Thus we see that for
state 1, the minimum cost decision has the value 1.7. Although
not explicitly shown, the decision leading to that value is the Replace option.
Similarly the state values for each state are computed as the
minimum of the values for the decisions associated with each
state. The First
Step button at the top of the page sets the Next Values to
zero. |
|
| |
Clicking the Next
Step button performs an iteration. The iteration process
copies the state values and pastes them into the
next value range. The cut/paste operation
is accomplished by the VBA code of the add-in. The results are
shown below. The minimum for state 1 (the New state)
is now obtained with the Inspect decision. |
|
| |
A second click of the Next
Step button
repeats the copy/paste operation to obtain the
results below. As the iteration steps increase, the values in
the next
value range
approach the values in the state value range. Each step
represents an advancement in time of one month. |
|
| |
Clicking the Many
Steps button at the top of the page presents a dialog
accepting an integer number of steps.
After 100 steps the results of the example are
below. The next values are converging to the state values.
It can be shown that this result will always be true for a
positive interest rate. |
|
| |
The upper left portion
of the worksheet shows the number of steps and the maximum difference
between the state values and the next values.
The optimum decisions are shown in columns G and H. The optimum
decision is to inspect the new and 1-month old bulbs. A 2-month
old bulb is replaced without inspection at the end of the month.
This is equivalent to replacing the bulb automatically when it
is three months old. The replace decision is also optimum
for 3 and 4 month bulbs. We will see however, that these states
can never be reached if the bulb is replaced at the end of its
second month of life. |
| |
 |
Changing the MDP |
| |
The Change MDP presents
the dialog below. States, decisions, and state-decision combinations
can be added or deleted from the model. A requirement of the
model is that the number of state-decision combinations remain
two or greater for each state. For states that allow
only one decision, the Null decision can be used as
the second option. More complex problems can have more decisions
and more decisions per state.

|
Make Markov Chain |
| |
The Make Markov Chain button
creates a DTMC model using the current decisions. The transition
matrix for the example, is shown below. For the optimum policy,
the 3 and 4-month states are transient. State names, time period
and transition probabilities are transferred to the matrix worksheet
of the DTMC model. |
|
| |
The economic measure
(Cost), discount rate and state costs are transferred to the
DTMC on the Economic Data worksheet. |
|
| |
The steady-state results
for this model can be compared with the results for the inspection
only results. New bulbs are purchased in almost 50% of the months
as compared to about 43% with inspection. The Expected NPW is
smaller for every state using the optimum policy than the
inspect-always policy. This should be the case because the optimum
policy minimizes this measure for every state. |
Steady-State Analysis for the optimum MDP
policy

Steady-State Analysis when inspections are performed
in each state

|
Usefulness of MDP |
| |
A number of interesting
decision problems can be addressed using MDP analysis. Because
the model is entirely constructed on a single Excel worksheet,
the model size is limited by the size of the worksheet. The
number of rows of the model is equal to the number of states
multiplied by the number of decisions and the number of columns
is equal to the number of states. Practical MDP models may
be very large with respect to the number of states and therefore
the number of rows and columns. For this add-in, the model
size is limited by the rows and columns of the Excel worksheet.
Excel 2007 allows quite large models, but certainly very large
problems would be better handled by stand-alone programs. |
| |
|