The SMPS format for stochastic linear programs
Last modified 06
October 2005.
H.I.
Gassmann,
This site explains the complete SMPS format used to describe stochastic
linear programs. The format is based on the well-known MPS
format for deterministic linear programs and derives from previous work
by Edwards et
al., Birge et al., and Gassmann and Schweitzer.
Examples of stochastic programs are given elsewhere. (See, for instance, the books by Birge and Louveaux and Kall and Wallace.)
Please send comments and feedback to the author.
Contents
Core file sections: NAME,
ROWS, COLUMNS, RHS, RANGES, BOUNDS
Fine points: MPS record format, Network format, Mixed LP and networks, Integer variables, Quadratic objectives, Probabilistic objectives, Quantile objectives
Time file sections: TIME,
PERIODS, ROWS, COLUMNS
Fine points: Implicitly named columns
Stoch file sections:
STOCH, SIMPLE, ROBUST, PLINQUAD, CHANCE, ICC, SCENARIOS, NODES, INDEP, BLOCKS, DISTRIB
Fine points: Linking constraints
Solver capabilities and error recovery
The SMPS format was designed to make it easy to convert existing deterministic linear programs into stochastic ones by adding information about the dynamic and stochastic structure. This is done using three text files, the core file, time file and stochastics file (or stoch file, for short). Each file consists of several sections, which must appear in predefined order. The first record of each section is a header record, which is followed by zero or more data records. Empty sections (containing no data) may be omitted. The records in each file are organized into fields according to the MPS record layout.
The core file lists all the deterministic
information of the problem, in either MPS
format for linear programs or in NETGEN format
for problems in network format. Core file information includes the name and
type of each constraint, the names of rows and columns, a column-ordered list
of coefficients in the constraint matrix, right-hand side coefficients and
any bounds and ranges on the variables and slacks. The core file also provides
placeholders for all the stochastic elements (which must be mentioned in the
core file and given a preliminary value that may or may not be meaningful).
The core file thus represents a deterministic problem, which may be thought
of as a typical scenario, an average case, a baseline case or similar.
Sections in the core file
|
Section |
Purpose |
|
Gives a name to the problem in name field
2 |
|
|
Gives the name and type of each constraint
and objective |
|
|
Gives the nonzero coefficients in column
order |
|
|
For non-zero entries of the right-hand
side vector (optional) |
|
|
For range constraints (optional) |
|
|
For bounds on decision variables. Different
codes identify different types of bounds (optional) |
|
|
ENDATA |
End of problem data |
The small print:
Network format and mixed LP-network
The name of the problem is given in the second name field. This name is
verified against the name in the time and stoch
files.
Example
*23456789 123456789 123456789
NAME
Example
The type of each row is given in the code field (‘L’ indicates that the
constraint is of the form
,
‘G’ is for constraints of the form
,
‘E’ is for equality constraints, ‘N’ indicates a free row (no constraint)
or potential objective row. By default the first ‘N’-type row is used as the
objective. Each row is given a name in the first name field.
Example
*23456789 123456789 123456789
ROWS
N OBJECT
G ROW2
The nonzero coefficients of the constraint matrix are given in column
order. The code field is empty. The first name field contains the name of
the column, the second name field must match the name of a row, and the first
numeric field gives the value of the corresponding coefficient in the constraint
matrix. A second coefficient may appear on the same record using the third
name field for the name of another row and the second numeric field for the
corresponding coefficient.
Example
*23456789 123456789 123456789
123456789 123456789 123456789
COLS
COL1
ROW2 2.0 OBJECT 100.
The first name field on a data record contains an identifying name (that
must match information given externally), the second
name field contains the name of a constraint mentioned in the ROWS section. The first numeric field contains the value
of the right-hand side in this constraint. The third name field and second
numeric field may contain the information corresponding to one further constraint.
Only nonzero right-hand side values require an entry in this section, and
the order does not matter.
Example
*23456789 123456789 123456789
RHS
RHS1 ROW2 50.
The RANGES section allows the user to set up constraints that act as both
L and G-type constraints. The first name field on a data record contains an
identifying name (that must match information given externally), the second name field contains the name of a constraint mentioned
in the ROWS section. The first numeric field contains
a range for the value of the right-hand side in this constraint. The third
name field and second numeric field may contain the information corresponding
to one further constraint. The order of constraints does not matter.
Assume that the right-hand side value for the row is bi and the entry in the ranges
section is ri. Then the action of the range command
is as follows:
|
Row type |
|
Effect |
|
|
|
G |
|
|
|
|
|
L |
|
|
|
|
|
E |
|
|
|
if ri
> 0 |
|
E |
|
|
|
if ri
< 0 |
Example
*23456789 123456789 123456789
RANGES
RANGE1 ROW2 10.
If ROW2 is a G-type row with previously defined right-hand side 50, then
this declaration defines a range of [50,60].
The first name field on a data record contains an identifying name (that
must match information given externally), the second
name field contains the name of a variable mentioned in the COLUMNS section. The first numeric field contains a value
associated with this bound. The type of bound is clarified in the code field.
The third name field and second numeric field are not used. The order of the
columns does not matter.
|
Code |
Interpretation |
|
LO |
Lower bound |
|
UP |
Upper bound |
|
FX |
Fixed value |
|
FR |
Free variable |
|
MI |
Nonpositive variable. (Lower bound is
|
|
PL |
Nonnegative variable (Upper bound is |
|
BV |
Binary variable (0-1 variable) |
|
UI |
Upper bound. If the variable was declared as integer in the COLUMNS
section, the upper bound is rounded down to the nearest integer. |
If the code field contains the values FX, MI, PL or BV, the value in the
first numeric field is ignored. Default bounds are 0 and
for continuous variables and 0 and 1 for integer
variables.
Example
*23456789 123456789 123456789
BOUNDS
UP BOUND1 COL1 20.
This example defines an explicit upper bound of 20 for decision variable
COL1.
One must distinguish between header records and data records.
Header records contain up to three name fields, normally in these positions:
|
columns 1–14 |
first word field |
|
columns 15–24 |
second word field |
|
columns 40–49 |
third word field |
Data records contain up to three name fields, two numeric fields and a code field, normally in these positions:
|
columns 2 and 3 |
code field |
|
columns 5–12 |
first name field |
|
columns 15–22 |
second name field |
|
columns 25–36 |
first numeric field |
|
columns 40–47 |
third name field |
|
columns 50–61 |
second numeric field |
Names are normally restricted to eight characters and may contain any ASCII symbol; numerical input is limited to twelve characters (including digits, signs, exponents and a decimal point). Free format input can be used if these limits are too restrictive. However, in order to allow for correct parsing of data records, some fairly non-restrictive rules must be followed:
1. Each field is surrounded by white space (one or more blanks) and does not contain an embedded blank.:
2. The first name field of a header record must start in column 1.
3. Names and numeric information on data records start in column 5 (or further to the right). The first four columns must be reserved for the code field (which is to be placed into columns 2 and 3).
4. Numbers (numerical strings) must start with a digit (or a decimal point immediately followed by a digit). Exponential notation (e.g., 1.E–5 or .3e+6) is allowed.
5. Names (character strings) have no embedded blanks and must start with a non-numeric character. (‘plus2’, ‘an_extremely_long_and_tedious_name’, ‘.cost’ and ‘$1000’ are all valid names, but ‘sp ace’, ‘2B’ and ‘.5off’ are not.)
The format used for network sections is modified somewhat from the proposal by Klingman et al. As in the MPS format there are header records and data records. The data records are organized in the following manner:
Data record for networks
|
columns 2–4 |
code field |
|
columns 7–12 |
first name field (from-node) |
|
columns 13–18 |
second name field (to-node) |
|
columns 21–30 |
first numeric field (unit flow cost) |
|
columns 31–40 |
second numeric field (optional: maximum flow) |
|
columns 41–50 |
third numeric field (optional: minimum flow) |
|
columns 51–60 |
fourth numeric field (optional: multiplier) |
|
columns 63–70 |
third name field (optional: see below) |
Note that in this format the name fields are only six characters wide and the numeric fields hold ten characters each.
The following modifications to the NETGEN format are made in SMPS: For consistency, the BEGIN record is changed to a NAME record, and the END record is replaced by ENDATA. More importantly, since arcs are named only implicitly, it is important that each node is associated unequivocally with its time period. This requires a separate NODES section, which follows the same rules as the ROWS section. The two sections can be used interchangeably.
Even this is not enough to ensure that implicitly named arcs are assigned to the correct time period. The convention is that an arc belongs to the earlier of the periods containing the origin and destination nodes, unless otherwise specified. This convention can be overridden by placing the name of the period to which the arc belongs into the third name field.
Core file sections for networks
|
Section
|
Purpose |
|
NAME |
Gives
a name to the problem in name field 2 |
|
NODES |
Gives
the name of each node |
|
ARCS |
Specifies
the cost, flow limits and multiplier of each arc |
|
SUPPLY |
Gives
supply at each supply node |
|
DEMAND |
Gives
demand at each demand node |
|
ENDATA |
End
of problem data |
Example
NAME Network format
ARCS
NODE1 NODE2 cost upper lower multiplier
1 NODE1 NODE3 cost upper lower multiplier
2 NODE1 NODE3 cost upper lower multiplier
...
NODEk NODEn cost upper lower multiplier
SUPPLY
...
NODE1 amount
DEMAND
...
ENDATA
Note the two parallel arcs connecting node1 and node3.
Network and LP sections can be mixed. It might be useful, for example to set up a two-stage problem in which the first stage is an LP model concerned with facility location and capacity planning, while the second stage is a network flow problem that allocates random demands once the facilities have been opened. Another example is a network with side constraints, which can easily arise in financial applications.
Mixed LP and network problems use all the sections described previously under LP and network problems. The following sections serve the same purpose and may be alternated freely:
ROWS and NODES sections,
COLUMNS and ARCS sections,
SUPPLY, DEMAND and RHS sections.
The general order of sections of mixed LP and network problems is as follows:
|
Section
|
Purpose |
|
NAME |
Gives
a name to the problem in name field 2 |
|
ROWS |
Give the name of each row/node. |
|
NODES |
|
|
COLUMNS |
Give
the name of each variable/arc. ARCS sections also specify cost coefficients
and variable bounds. |
|
ARCS |
|
|
RHS |
Give
right-hand side values for the constraints |
|
SUPPLY |
|
|
DEMAND |
|
|
RANGES |
(optional) |
|
BOUNDS |
(optional) |
|
ENDATA |
End
of problem data |
The remaining challenge is to specify constraint coefficients to the various constraints. In mixed LP and networks problems, one has to be prepared to deal with four distinct situations:
|
LP columns and constraint coefficients in LP rows |
This is easily handled with an MPS-style data record |
|
LP columns that bring flow to a network node. |
The only difficulty here is that nodes are not named in a separate ROWS section |
|
Network arcs in node constraints |
This is easily handled in a NETGEN-style data record. |
|
Network arcs in LP constraints |
This is the most difficult situation and requires careful attention. |
The challenge with network arcs appearing in LP constraints is that arcs are not named explicitly, so that three name fields are needed (two for the two nodes incident to the arc and a third for the constraint row), whereas the original NETGEN standard allows only two.
The SMPS format extends the NETGEN data record by allowing a third name field to be placed in columns 63–70. If not empty, the third name field refers to an LP row name, and the first numeric field is taken as the coefficient value. There should be no further numerical data on such records. If the third name field is empty, the record is a regular NETGEN data record.
Example
Time file:
*23456789 123456789 123456789
123456789 123456789 123456789
TIME Mixed LP and network example
PERIODS
COL1
ROW1 PERIOD1
COL6
ROW3 PERIOD2
'NETWRK'
NODE1 PERIOD3
'NETWRK'
NODE3 PERIOD4
COL18
ROW5 PERIOD5
ENDATA
Core file
*23456789 123456789 123456789
123456789 123456789 123456789
NAME Mixed LP and network example
ROWS
...
L
NODE1
G
NODE2
...
COLUMNS
...
COL16
ROW3 value NODE1 value
COL17
ROW4 value NODE2 value
ARCS
NODE1
NODE2 cost upper lower
multiplier
NODE1
NODE7 cost upper lower
multiplier
...
NODE8
ROW5 cost upper
lower multiplier
NODE8
ROW6 cost upper
lower multiplier
COLUMNS
COL18
ROW5 value ROW7 value
COL19
ROW6 value ROW8 value
...
RHS
...
SUPPLY
...
ENDATA
The first COLUMNS section lists the deterministic coefficients of the constraint matrices belonging to periods 1 and 2. Two entries of this last matrix are illustrated in the last two data records in this section. These entries will take values from the second to the third stage. The amount will be determined by the values of COL16 and COL17 and the corresponding entries in this COLUMNS section.
The last two entries in the ARCS section will bring “flow” from NODE8 to the right hand side of ROW5 and ROW6 by entering a number in those rows. The number will be the “multiplier” from the input, and the value ending up on the right hand side will be the product of this multiplier and the flow running out of NODE8 to these rows (which are nodes when viewed from the network).
Integer variables can be handled in the same way as in MPS: Special data records called indicator records are inserted before the start of a group of integer variables and again at the end. These data records define the type of integer variables used through special markers in the name fields.
For general integer variables, the code field of the starting indicator record should be empty, the second name field should contain the marker 'MARKER', and the third name field should contain the marker 'INTORG'. The first name field may contain any value that differs from the name of the previous column. The ending indicator record is identical to the opening indicator, except that the third name field contains the marker 'INTEND'.
Example
*23456789 123456789 123456789
123456789 123456789 123456789 123456789
INT
'MARKER' 'INTORG'
Col1
Row1 1.0 Row2 -1.0
Col2
Row1 -1.0 Row3 1.0
INT
'MARKER' 'INTEND'
This example sets up two integer columns, Col1 and Col2.
Special ordered sets can be used to further restrict the possible values of a set of integer variables. At most one of the variables in a special ordered set of type 1 may take a nonzero value. In a special ordered set of type 2 at most two variables may take nonzero values, and they must be adjacent within the set. Special ordered sets of type 3 are used to model decisions, as they force all of the integer variables in the set to be 0-1 variables.
Special ordered sets of type 1 are denoted by the following indicator records: The starting indicator record contains a code value of S1, an arbitrary name in the first name field, the keyword 'MARKER' in the second name field and the keyword 'SOSORG' in the third name field. The ending indicator record is similar, but it contains the keyword 'SOSEND' in the third name field.
*23456789 123456789 123456789
123456789 123456789 123456789 123456789
S1 INT1 'MARKER'
'SOSORG'
Col1
Row1 1.0 Row2 -1.0
Col2
Row1 -1.0 Row3 1.0
S1 INT1 'MARKER'
'SOSEND'
In this example the two integer columns Col1 and Col2 form a special ordered set of type 1.
Special ordered sets of type 3
Special ordered sets of type 3 are useful in the following situation:
1. There exists an equality constraint whose right-hand side is 1.
2. The coefficients of all variables in this row are either 0 or 1.
3. The variables whose coefficients are non-zero are integer variables.
4. At most one of these integer variables can be non-zero.
In this case
it can be deduced that exactly one of the integer variables takes a value
of 1, and that all of them are 0-1 variables. These variables form a special
ordered set of type 3. Special ordered sets of type 3 are declared in SMPS
by a construct similar to other special ordered sets. The starting indicator
record has an empty code field and the name of a valid equality row in the
first name field. The second name field contains the keyword 'MARKER' and
the third name field contains the keyword 'SOSORG'. In the ending indicator
record, the keyword 'SOSEND' replaces the keyword 'SOSORG'.
Example:
*23456789 123456789 123456789
123456789 123456789 123456789
Row3
'MARKER' 'SOSORG'
Col1
Row1 1.0 Row3 1.0
Col2
Row1 1.0 Row3 1.0
Row3
'MARKER' 'SOSEND'
A quadratic objective can be set up following the linear part of the core file. It uses the following sections:
|
Section
|
Purpose |
|
NAME |
Gives
a name to the problem in name field 2 |
|
QSECTION |
Gives
the nonzero coefficients in the quadratic objective |
|
ENDATA |
End
of problem data |
The NAME section gives a name in the second name field. This name must match the name given in the linear portion of the core file. The second name field in the QSECTION header record contains no information. The first two name fields on a data record contain the column names of a quadratic term, the first name field contains the coefficient. A common factor of ½ is assumed throughout.
The QSECTION can be read from a separate file if necessary.
Example
*23456789 123456789 123456789
123456789
NAME Example
QSECTION
Col1
Col1 1.0
Col1
Col2 3.0
Col2
Col2 2.0
ENDATA
sets up the objective 0.5*(Col1)2 + 1.5*Col1*Col2 + (Col2)2.
Probabilistic objectives take the form
![]()
where the optimization could be either minimization or maximization and the relation ~ could be ³, £ or =.
This is very similar to a single chance constraint, and thus it can be represented in SMPS format in the same way. If ROW1 is a constraint row specified in the core file, then the following example defines a probabilistic objective:
*23456789 123456789
CHANCE INDIV
The numeric fields are not used.
Probabilistic objectives involving multiple (joint) constraints can be set up in the same manner:
*23456789
123456789 123456789 123456789
CHANCE
JN CC1 CON1
ROW2
ROW3
Quantile objectives take the form
.
This is the same as optimizing c subject to the simple constraint
along with the other constraints of the problem.
Purpose of the time file is to allow breaking down the core file scenario
into nodes corresponding to the individual stages.
Sections in the time file
|
Section |
Purpose |
|
Gives a name to the problem in name field
2 |
|
|
Gives the name of each time period (in
order) |
|
|
(optional) Specifies for each row the
period to which it belongs |
|
|
(optional) Specifies for each column the
period to which it belongs |
|
|
ENDATA |
End of problem data |
The small print
The second name field on the TIME header record contains the name of the
problem. This name is verified against the name in the core and stoch files.
Example
*23456789 123456789 123456789
TIME
Example
The header record contains the value PERIODS in the first name field.
If the core file is in time order, then the second name field can be left
blank or given the value IMPLICIT. The ROWS and COLUMNS sections are not needed in this case.
If the core file is not given in temporal order (for instance if it was
generated by an algebraic modelling language), then the keyword EXPLICIT must
be placed in the second name field of the header record and the ROWS and COLUMNS
sections must be present.
Each data record contains the name of a period in the third name field.
If the time file is in implicit format, then the first name field contains
the name of the first column in this period, the second name field contains
the name of the first row. If the time file is in explicit format, the first
two name fields are not used. The objective row must always belong to the
first period.
Example
*23456789 123456789 123456789
123456789 123456789 123456789
PERIODS
IMPLICIT
COL1
ROW1 PER1
COL5
ROW3 PER2
In this example all columns in the core file that appear between COL1
(inclusive) and COL5 (exclusive) make up the first period, named PER1, while
COL5 and all the remaining columns belong to period PER2. Similarly, ROW1
and ROW2 (along with the objective) are the rows belonging to the first period;
all the others are second-period rows.
The data records in this section contain the name of a row mentioned in the core file in the first name field; the second name field contains the name of a time period set up in the preceding PERIODS section. The ROWS section is optional if the core file is given in temporal order.
Example
*23456789 123456789 123456789
ROWS
ROW1
PER1
ROW3
PER2
ROW2
PER1
...
The data records in this section contain the name of a column mentioned
in the core file in the first name field; the second name field contains the
name of a time period set up in the preceding PERIODS section.
The COLUMNS section is optional if the core file is given in temporal order.
Example
*23456789 123456789 123456789
COLS
COL1
PER1
COL5
PER2
COL2
PER1
...
If the core file is given in temporal order and the time file only provides markers for the first row and column of each period, then there exists a problem with columns that do not have explicit names given in the core file. This situation can arise in two circumstances: if the core file has network components, or if there are penalties (linear, quadratic or mixed) for violating a constraint. In the first instance, the first name field of a data record in the PERIODS section should contain the marker 'NETWRK', in the second instance it should contain the marker 'PENLTY'. For an example see the section on mixing LP and network formats.
Stages help many stochastic programming solvers
to decompose the problem into smaller pieces. They denote the times when new
information becomes available. The stages form a subset of the time structure
and are characterized by an alternating sequence of decision points and random
events. It is assumed that at the time of the first decision all explicit
constraints are known, so that the sequence always starts with decision
- event - decision - event ....
The end of the sequence for recourse problems is ...decision - event -
decision, while most problems with chance constraints contain only the
sequence decision - event. (Multistage problems with probabilistic
constraints have been investigated by Prékopa.)
The usual convention for problems with probabilistic
constraints is to consider a stage as (decision – event), while for
recourse problems it is more common to use the opposite view. However, to
allow mixing of recourse and chance constraints, consistency is essential.
This format uses the convention that a stage is made up of an event and a
decision (in that order), with the very last event possibly irrelevant.
This has the advantage that the events and
the induced decisions lie in the same stage, simplifying the indexing notation
for finite distributions and making it easier to set up problems with incomplete
information (where some of the first stage coefficients are unknown). The
only drawback of this method is that ordinary chance-constrained problems
must be thought of as two-stage problems with an unusual second stage
that contains no decisions.
Assignment of problem elements to the various
stages must then be consistent with the following rules:
1. A column belongs to the stage in which
the decision is made.
2. A row belongs to the stage of the latest
decision incident with the row.
3. A random variable belongs to the earliest
stage that has a stochastic element depending on this random variable.
4. For finite distributions all columns (and
with them the rows) must be replicated according to the random variables of
the same stage. In particular, this means that for chance-constrained problems
(where the final stage contains no decisions) replication is not needed
in the final stage.
The purpose of the stoch file is to allow the
solver to build a deterministic equivalent, which requires information about
the random variables. If all the random variables are finitely distributed,
it amounts to the construction of an event tree. The event tree can be described
in three different ways: scenario by scenario, node by node, or implicitly
using marginal distributions. Implicit descriptions are also available if
the random variables are continuously distributed, where explicit descriptions
would obviously fail.
The stoch file may contain the following sections:
|
Section |
Purpose |
|
Gives a name to the problem in name field
2 |
|
|
For problems with simple recourse (optional) |
|
|
For quadratic penalties (optional) |
|
|
For piecewise linear-quadratic penalties
(optional) |
|
|
For chance constraints (optional) |
|
|
For integrated chance constraints (optional) |
|
|
Gives the values of stochastic elements
one scenario at a time |
|
|
Gives the values of stochastic elements
one node at a time |
|
|
Marginal distribution of independent random
variables |
|
|
Marginal distribution of a random vector |
|
|
Distribution of auxiliary random variables |
|
|
ENDATA |
End of problem data |
The three ways of describing the event tree are mutually exclusive. That
is, a stoch file cannot contain SCENARIO, NODE and
INDEP/BLOCK/DISTRIB sections simultaneously. INDEP, BLOCKS and DISTRIB sections
may be mixed freely, although the order must be such that enough information
is available to construct the event tree during a single pass through the
file.
The small print
The second name field on the STOCH header record contains the name of
the problem. This name is verified against the name in the core and time files.
Example
*23456789 123456789
STOCH
Example
SIMPLE section for simple recourse
Simple recourse problems use very special recourse matrices, which should
not have to be set up by the user explicitly. In simple recourse problems
a subset of the rows is of the form
,
where the auxiliary variables
and
are nonnegative and are used to balance any shortage
(
)
or surplus (
)
allocation of the decision variables after the random variable bi has been observed. Any shortage
incurs a penalty of
,
surplus is penalized by
.
We set up a separate section named SIMPLE as part of the stoch file. Each data record in this section contains the
name of one row in the second name field (the first name field contains an
identifier and can be used to set up multiple sets of simple recourse coefficients),
and the cost coefficients are placed in the two numeric fields;
is first, followed by
. The penalty costs
and
must satisfy the condition
in order for the problem to be bounded; this
constraint is enforced by the system.
Each occurrence of a row name in this section triggers the generation
of additional decision variables, identified by appending the suffix .minus
and .plus after the name of the row. If the row represents an equality constraint,
both variables are generated, along with their coefficients in the constraint
matrix. If the row is of type L, only the shortage variable <rowname>.minus is created; if the row is of type G, only
the surplus variable <rowname>.plus is added.
The penalty terms in a simple recourse row will most likely be deterministic,
but in a pinch the user can specify stochastic coefficients by using these
system-generated variable names.
The recourse variables may be integer-valued. This fact is signalled by
placing integer markers into the stoch file.
Example
*23456789 123456789 123456789
123456789 123456789 123456789
SIMPLE
SIMPLE1
ROW1 10.0 5.0
SIMPLE1
ROW2 50.0
Assume that ROW1 is an E-type row and ROW2 has type G. Then each unit
of shortage in ROW1 is penalized by 10.0 units in the objective; each unit
of surplus has a penalty of 5. Surplus in ROW2 is penalized by 50 units.
In order to allow stochastic recourse costs, the SIMPLE section precedes the NODES,
SCENARIOS, INDEP, BLOCKS and DISTRIB sections.
ROBUST section for robust optimization (quadratic penalties)
Quadratic penalties for surplus and shortage have a similar function as
the piecewise linear penalties in the section on simple
recourse. The most common application of this feature is in robust programming
(see, e.g., Mulvey et al.).
We provide a section named ROBUST with syntax identical to the SIMPLE section to provide quadratic penalty coefficients
on
and
. The ROBUST section
follows the SIMPLE section and, like it, precedes the distribution sections.
In keeping with the conventions for the quadratic objective, the penalty term
includes an implicit factor of 2, that is, if the entry is of the form
*23456789 123456789 123456789
123456789 123456789 123456789
ROBUST1 ROW1
0.5 1.5
then negative deviations from the target x0 are penalized by 0.25*(x–x0)2, positive deviations
by 0.75*(x–x0)2.
In many applications the quadratic penalty term is uniform, i.e., it is
equal to some value r/2 for all constraints. For simplicity this coefficient
can be specified directly in the first numeric field of the header record.
PLINQUAD section for piecewise linear-quadratic penalties
Another type of penalties has received attention in the literature, the
piecewise linear-quadratic penalties described by King.
These are penalties of the form
.
Because of continuity and differentiability requirements at 0 and b/a,
two parameters suffice to describe these one-sided penalties, the curvature
a of the quadratic
part and the slope b of the linear
part. We propose to express these parameters in a PLINQUAD section, to follow
the SIMPLE and ROBUST
sections. Each data record in the PLINQUAD section lists an identifying name
in the first name field, a valid row name in the second, the quadratic parameter
in the first numeric field, and the linear parameter in the second. (If the
linear parameter is nonpositive, the penalty is
assumed to be a pure quadratic.)
If the mentioned row is of type L, the penalty is applied to positive
deviations, if the row is of type G, the penalty applies to negative deviations.
In the case of equality constraints, the range on which to apply the penalty
is ambiguous but can be clarified in the code field. If the code field is
empty, the penalty is applied to both positive and negative deviations; the
code PL denotes positive deviations, the code MI is used to specify negative
deviations.
Example
*23456789 123456789 123456789
123456789 123456789 123456789
PLINQUAD
PLQ1
ROW3 1.0 10.0
Assume that ROW3 is of type L. Then the first 10 units of shortage are penalized at the progressive rate of 0.5*(short)2, all subsequent units are penalized at a rate of 10.
CHANCE section for chance-constrained problems
This section describes the modelling of chance-constrained problems. There
are individual (one-dimensional) and joint (multidimensional) chance constraints,
with slightly differing syntax.
For the purpose of this exposition we distinguish four different forms
of chance constraints:
![]()
![]()
![]()
![]()
The first two of these are one-dimensional chance constraints, the last
two are multidimensional; the symbol ~ indicates an arbitrary constraint type
(as set up in the core file), Ak is a matrix,
bk
a conformal subset of the right-hand side vector b.
The CHANCE section deals only with supplying the reliability levels a and fault tolerances
e, the constraints themselves are defined in the core
file, along with their deterministic components; the distributions of the
stochastic elements are described in later sections of the stoch
file.
Data records for individual chance constraints list the name of the row
in the second name field and the reliability level in the first numeric field.
The code field can be used to indicate whether we are dealing with a reliability
level (Pr{…} ³ ak — using G as the
code value) or a fault tolerance (Pr{…} £ ek — using code value
L). Other valid codes are N and E. The first of these can be used to set up
a probabilistic objective,
the latter is here mostly for completeness. For N-type records, the numeric
fields are ignored.
The first name field is used as a label for this group of constraints.
This permits selecting different chance-constraints or perhaps different probability
levels, similar to the device of specifying multiple objective rows and/or
right-hand side sections in an MPS file.
Example:
*23456789
123456789 123456789 123456789
CHANCE
G CC1 ROW1
0.95
L CC1 ROW2
0.10
This sets up two probabilistic constraints that should be interpreted
as
![]()
![]()
assuming that ROW1 and ROW2 were both
defined as G-type rows in the core file.
For joint chance constraints we use two types of data records. The first,
denoted with a code in the code field, marks the start of the next block,
the direction of the probabilistic inequality (JG for reliability levels,
JL for fault tolerances), the label for this group of chance constraints in
name field 1, a distinguishing name in name field 2 (for the somewhat esoteric
situation of a multidimensional chance constraint with a probability level
that itself depends on a particular realization of a random variable in the
past), and the value of ak in the first numeric
field. The following data records list the names of the rows participating
in this constraint, as in the following example:
*23456789
123456789 123456789 123456789
CHANCE
JG CC1 CON1
0.98
ROW3
ROW4
JL CC1 CON2
0.001
ROW5
ROW6
ROW7
This example shows a mixed chance-constrained problem with two multidimensional
probabilistic constraints. The first one requires a reliability level of at
least 0.98 on ROW3 and ROW4, i.e.,
![]()
the second prescribes a fault tolerance of no more
than 0.001 on ROW5, ROW6 and ROW7, i.e.,
![]()
assuming that all rows were originally
defined as G-type rows in the core file.
ICC section for integrated chance constraints
This type of constraint was introduced by Klein
Haneveld in 1986. Assume that a deterministic
version of the problem features the constraint
![]()
Then an integrated chance constraint is of the form
![]()
where ~ is an arbitrary relation
(³, £ or =).
The SMPS format defines an ICC section to set up such integrated chance
constraints.
The header record contains the value ICC in the first name field. Each
data record contains a code of L or G, an identifying name in the first name
field, the name of a row in the second name field, and a numeric value for
d in the first numeric field.
Example:
*23456789
123456789 123456789 123456789
ICC
L ICC1 ROW8
10.
Assuming that ROW8 was set up as an L-type row in the core file with right-hand
side 100, this stoch file snippet specifies the
integrated chance constraint
![]()
This section describes an explicit event tree scenario by scenario. There
is a unique node in the event tree that belongs to the first period. This
node is sometimes called the root node. Scenarios are identified with the
leaf nodes of the tree. One could think of the scenarios as paths all the
way from the root node to the leaves, but it is easier to deal with them in
this simplified manner: One scenario (the root
scenario) represents a path from the root node to one of the leaves. All
other scenarios branch from a parent
scenario at some time prior to the final period. Information up to the branch
period is shared between a scenario and its parent scenario, only after the
branch occurred will there be duplicate information. This point of view is
advantageous because it allows for a reduction of redundancy in the tree,
which compresses the size of the stoch file.
To code the event tree information into the SCENARIOS section, we use
two types of data records. The first, denoted by SC in the code field, signals
the start of a new scenario. It gives the name of the scenario in the first
name field and its path probability
in the first numeric field. The name of the scenario from which it branches
is written in the second name field, and the name of the period in which the
branch occurred—i.e., the first period in which the two scenarios differ—in
the third name field. The root scenario originates in stage one and has no
parent scenario. This should be indicated by using the keyword 'ROOT' (including
the quotes) in the second name field.
The next data records give the column/row values assumed by the scenario
in the same format used in the corresponding section of the core file.
Example
*23456789
123456789 123456789 123456789
123456789
SCENARIOS
SC SCEN_A 'ROOT'
0.3 PERIOD1
COL2 ROW3
1.0
COL3 ROW4
1.0
UP BOUND1 COL5
1.0
SC SCEN_B SCEN_A
0.2 PERIOD3
COL3 ROW4
2.0
UP BOUND1 COL5
2.0
SC SCEN_C SCEN_A
0.1 PERIOD2
COL2 ROW3
2.0
COL3 ROW4
2.0
UP BOUND1 COL5
2.0
SC SCEN_D SCEN_C
0.1 PERIOD3
COL3 ROW4
1.0
UP BOUND1 COL5
1.0
SC SCEN_E SCEN_D
0.1 PERIOD4
UP BOUND1 COL5
2.0
SC SCEN_F SCEN_A
0.1 PERIOD2
COL2 ROW3
3.0
COL3 ROW4
3.0
UP BOUND1 COL5
3.0
SC SCEN_G SCEN_F
0.1 PERIOD4
COL4 ROW5
2.0
This is a description of the distribution of three entries: matrix entries
in positions COL2/ROW3 and COL3/ROW4, and the upper bound on variable COL5,
which we assume to occur in stages 2, 3 and 4, respectively. (The event tree
is the tree of figure 1.)
Figure 1. An event tree with seven scenarios
A minor modification of the preceding format is needed for network parameters. Since arcs are defined by pairs of nodes, to say that the cost of
the arc from node NODE1 to node NODE2 is random requires three parameters.
The same goes for bounds and multipliers. In addition to the two name fields,
we use the code field as in the following example. (The data record format
used in the stoch file for the networks case is
the same as that for the LP case.)
*23456789
123456789 123456789 123456789
123456789
SC SCEN_1 'ROOT'
0.3 PERIOD_1
C1 NODE1 NODE3
6.0
U2 NODE6 NODE8
7.0
SU NODE6 6.0
We use C1 for “cost of first arc from node NODE1 to node NODE2” and use
higher numbers, such as C2, for the second parallel arc, etc. (The second
character in the code field corresponds to the marker placed in the ARCS section
in the core file.) Similarly, M = multiplier, U = upper bound, L = lower bound,
X = upper and lower bound simultaneously.
Random supply and demand are accommodated by use of SU and DE, respectively,
in the code fields as illustrated. The code SI can be used to provide values
for a stochastic coefficient in a side constraint.
Any data items not specifically mentioned in the stoch
file are inherited by each scenario from its parent scenario, which in turn
may inherit them from its parent scenario, and so on. The base scenario inherits
its data from the core file. Two things about this scheme of multiple inheritance are noteworthy. First, it is essential that all
problems at a stage have the same problem dimensions and nonzero patterns.
Second, the values entered for a particular scenario replace any values inherited from the ancestors.
To see why additive or multiplicative perturbations are impractical, consider scenario E in the decision tree of figure 1. Scenario E inherits its values from scenario D, which inherits its third-stage data from scenario C, which in turn inherits them from scenario A, and possibly from the core file. There could therefore be as many as five data items involved before the value of a particular coefficient in scenario E is settled, and the danger of forgetting or misspecifying an addition or multiplication is considerable.
This section describes the construction of an event tree one node at a
time, which is very convenient when the depth of the tree is not uniform (containing
so-called trap states) or when the
number of rows or columns associated with a particular period depends on the
history of the data process. It is possible to treat such stochastic problem
dimensions in a scenario-wise description by introducing a number of empty
rows and columns, but these phantom
elements (phantom rows, phantom columns, and phantom nodes) introduce
irrelevant overhead into the problem formulation. The NODES section provides
an alternate way to describe the event tree, without the use of phantom elements.
There are two types of data records. The start of each node is marked
by a special code in the code field (see below). The first name field on this
record supplies a name to the node and the second name field gives the name
of the parent node (in the previous period). The first name field contains
the conditional probability of reaching this
node, given the parent node. The remaining data records for each node follow
the core file layout.
The code field in the node identifier indicates how the information for the node is to be obtained and influences the appearance of the remaining rows. There are two possibilities: one could copy the information from one node to another, allowing for modification of data, although the nonzero structure must remain the same. This is signaled by the value CP in the code field. The alternative is to make up the information on the spot by reading an MPS file. The code MK is used for this purpose.
Under normal circumstance, the MK option is easier to handle, since it
makes no assumptions about variable names of different nodes in the same time
stage. Individual MPS files can thus be prepared for each node and strung
together into one large file fairly easily. When the information is to be
created on the spot, we insert an ordinary MPS file after the node identifier
record. (We substitute an ENDNODE record to mark the end of the data for this
particular node, and the NAME section is optional.) The user should be careful
to note that the objective must have the same
name in every node. Quadratic objectives are consistent with the MK option;
however, the QSECTION must be incorporated into the stoch
file.
On the other hand, if there are two or more nodes that do share problem
dimensions and variable names, then it is possible to decrease the redundancy
and shorten the file length by using the CP option. If information is to be
copied, then the third name field on the node identifier record gives the
reference node where the information is
to be found. The third name field may contain the keyword 'CORFIL'. In this
case the information is retrieved from the core file.
Following the node identifier record one lists the modifications to the
constraint matrix, right-hand side, bounds, etc., in standard MPS format.
There should not be any header records in this section. Any data element not
explicitly mentioned is assumed to be identical to the corresponding data
element in the reference node.
In this node-by-node description it is much harder to convey the dynamic
structure of the problem. The user must therefore exercise discipline when
placing the off-diagonal matrix coefficients. Such entries must appear in
the block that defines the row,
so that the columns will belong to nodes that have been processed before and
can therefore be identified. This forces two further restrictions: Special
ordered sets of Type 3 cannot have subdiagonal entries
(unless they are specified explicitly as binary variables with explicit matrix
coefficients), and network records are not allowed. This is because the originating
and terminal nodes may belong to different time periods, creating ambiguities
as to where to place the data record. (Of course the user is free to supply
the network information on MPS style records.)
Every node in the event tree, even the deterministic ones, must be represented
by at least the NODE header. The node corresponding to the first period has
no predecessor; in this case the second name field should contain the keyword
'ROOT'.
Since the length of a scenario path and the problem dimensions are variable,
the contents of the time and core files have to be adjusted as well.
The time file must be present, but it does not necessarily describe the
full temporal structure. It must contain enough information to separate those
nodes of the core file that are referenced explicitly. The core file must
also be present. It lists the deterministic parts of the problem, which would
normally include the first stage.
Here is an example to illustrate the above rules.
Time file:
*23456789
123456789 123456789 123456789
123456789
TIME ProdDemo
PERIODS
Prod0 Cost
Period_0
Prod1 Demd1
Period_1
Prod2 Demd2
Period_2
ENDATA
Core file:
*23456789
123456789 123456789 123456789
123456789
NAME ProdDemo
ROWS
N Cost
L Stor0
L Stor1
L Stor2
COLUMNS
Prod0 Demd0 1.
Hold0 Demd0
-1. Demd1 1.
Hold0 Cost
1. Stor0 1.
Prod1 Demd1 1.
Hold1 Demd1
-1. Demd2 1.
Hold1 Cost
1. Stor1 1.
Prod2 Demd2 1.
Hold2 Demd2
-1.
Hold2 Cost
1. Stor2 1.
RHS
RHS Demd0 1.
RHS Demd1 4.
RHS Demd2 4.
RHS Stor0 4.
RHS Stor1 4.
RHS Stor2 4.
BOUNDS
UP BOUND Prod0 4.
UP BOUND Prod1 4.
UP BOUND Prod2 4.
ENDATA
Stoch file:
*23456789
123456789 123456789 123456789
123456789
STOCH ProdDemo
NODES
CP NODE0 'ROOT'
1.0 'CORFIL'
CP NODE1 NODE0
0.3 'CORFIL'
RHS Demd1
2.0
CP NODE2 NODE0
0.5 'CORFIL'
CP NODE3 NODE0
0.2 'CORFIL'
RHS DEMD1
6.0
CP NODE4 NODE2
0.3 'CORFIL'
RHS Demd2
4.0
CP NODE5 NODE2
0.4 'CORFIL'
RHS Demd2
5.0
CP NODE6 NODE2
0.3 'CORFIL'
RHS Demd2
6.0
*-----------------------
MK NODE7 NODE3
0.25
ROWS
L Stor2
COLUMNS
Hold1 Demd2W
1.0
Prod2W Demd2W
1.0
Prod2S Demd2S
1.0
Hold2W Demd2W
-1.0
Hold2W Cost
1.0 Stor2 1.0
Hold2S Demd2S
-1.0
Hold2S Cost
1.0 Stor2 1.0
RHS
RHS Demd2W
6.0
RHS Demd2S
4.0
RHS Stor2 4.0
BOUNDS
UP BOUND Prod2W
8.0
UP BOUND Prod2S
8.0
ENDNODE
*-----------------------
CP NODE8 NODE3
0.06 NODE7
RHS Demd2W
6.0
RHS Demd2S
6.0
....
ENDATA
This example describes a three-stage production problem. At the start only a single item is produced. If the demand in period 1 is low, production ceases entirely (the company goes out of business); if the demand in period 1 is medium, production continues; if demand in period 1 is high, a second item is introduced. The event tree with trap state is given in figure 2.
Figure 2. An event tree with a trap state
Penalties and moment constraints in NODES format
Variable names do not have to be shared between different nodes at the same stage. Hence it may be necessary to define penalties (SIMPLE, ROBUST, PLQ) as well as moment constraints (CHANCE and ICC) following the ROWS, COLUMNS and other sections of a node if the ‘MK’ option is used. Their order is arbitrary, and the format in each section is identical to the information given earlier in the respective sections. The ‘CP’ option provides inheritance of all data items from the reference node, including the penalty and moment sections. Any parameter in one of these constraints can be adjusted for the current node, if this should prove necessary.
This section describes the modelling of independent random variables.
The SMPS format allows for three different forms of the distribution: discretely
distributed, possessing a special distribution with no more than two parameters,
or by accessing a user-defined subroutine. The header record distinguishes
between these forms by placing appropriate keywords into the second name field
(see table below). The third keyword may contain the value ADD, MULTIPLY or
REPLACE to indicate how the value of the random variable is to modify the
information found in the core file. The default is REPLACE.
The data records have the following structure:
|
first name field |
column name of the random element |
|
second name field |
row name of the element |
|
first numeric field |
value |
|
second numeric field |
value |
|
third name field |
name of the period in which a
particular realization is observed. |
If the period can be inferred from the row/column name of the random element,
the third name field may be left empty.
The interpretation of the values
in the two numeric fields depends on the type of the distribution, as follows:
|
Keyword in second name field of the header record |
Distribution |
First numeric field |
Second numeric field |
|
DISCRETE |
Discrete distribution |
Value |
Probability* |
|
UNIFORM |
Continuous uniform on [a,b] |
a |
b |
|
|
Normal distribution |
mean m |
variance s2 |
|
GAMMA |
Gamma distribution (on [0,¥))
|
scale b |
shape c |
|
BETA |
Beta distribution (on [0,1])
|
n |
m |
|
LOGNORM |
Lognormal distribution |
m = E(ln L) |
s2 = Var(ln L) |
|
(any other) |
User-defined subroutine |
(not used) |
(not used) |
*Possible values for each random element must be listed in sequence, and
the probabilities must sum to 1.
The SMPS format provides for a mechanism by which the user can generate
the realizations by accessing a user-defined subroutine. Any modifier on the
header record different from the keywords for univariate
or multivariate distributions, that is, other than DISCRETE, UNIFORM, NORMAL,
GAMMA, BETA, LOGNORM, MVNORMAL, LINTRAN, ARMA, is taken to be the name of
a subroutine, which must be provided by the user. The subroutine can be given
calling parameters, and it must return one realization of the random variable
it represents. Each parameter value is listed on a separate record following
an indicator record containing only the code PR. Two types of values can be
passed in this way, integers and reals, assumed
to take up, respectively, four and eight bytes. The parameter type is inferred
from the appearance of the number, but the user can override this convention
by placing one of the two tags 'INTEGR' and 'DOUBLE' into the third name field;
their meaning should be self-evident.
Parameters can also be passed by location,
making it possible to access any problem coefficient or even current values
of the decision variables in order to model any dependence structure required.
Locations are identified by placing the column and row names into the first
two name fields. Current decision variables are identified by placing the
tag 'PRIMAL' into the first name field, current dual variables use the tag
'DUAL'.
Example 1
*23456789
123456789 123456789 123456789
123456789 123456789
INDEP DISCRETE
COL1 ROW8
6.0 PERIOD2 0.5
COL1 ROW8
8.0 PERIOD2 0.5
RHS ROW8
1.0 PERIOD2 0.1
RHS ROW8
2.0 PERIOD2 0.5
RHS ROW8
3.0 PERIOD2 0.4
In this example the entry COL1/ROW8 takes value 6.0 with probability 0.5
and 8.0 with probability 0.5; and the right-hand side of ROW8 takes values
{1.0, 2.0, 3.0} with probabilities {0.1,0.5,0.4} respectively. The example
shows random matrix coefficients only, other data items (including network data) can be rendered in the manner described in the
SCENARIOS section.
Example 2
*23456789
123456789 123456789 123456789
123456789 123456789
INDEP DISCRETE ADD
COL1 ROW8
-1.0 PERIOD2 0.5
COL1 ROW8
1.0 PERIOD2 0.5
Assuming that the core file contains a value of 7.0 for the coefficient COL1/ROW8,
the entry COL1/ROW8 will again take value 6.0 with probability 0.5 and 8.0
with probability 0.5.
Example 3
*23456789
123456789 123456789 123456789
123456789 123456789
INDEP UNIFORM
COL1 ROW8
8.0 PERIOD2 9.0
INDEP
RHS ROW9
10.0 PERIOD2 4.0
In this example the random entry COL1/ROW8 is uniformly distributed over
the interval [8.0,9.0], and the right-hand side in
ROW9 is normally distributed with mean 10.0 and variance 4.0.
Example 4
*23456789
123456789 123456789 123456789
123456789 123456789
INDEP TRIANG
COL5
ROW8 PERIOD2
PR
10.0
20.0
COL1 ROW1
This example shows how to set up a non-symmetric triangular distribution defined over an interval. This requires the specification of the mode along with the two endpoints of the interval. The mode is the third parameter, which is assumed equal to the (possibly random) realization of the coefficient in location COL1, ROW1. (It is left to the user to check that its value does indeed fall between the two endpoints 10 and 20.) When this example is distributed, the user should make sure to include the subroutine TRIANG(min,max,mode).
This section shows how to set up multidimensional random vectors that
may have correlated components but are independent of other random elements
in the problem.
The header record is similar to the INDEP header: The second name field
specifies the type of the block (DISCRETE, MVNORMAL, LINTRAN, ARMA or user defined subroutine),
the third name field describes the way the random information interacts with
the core file data (ADD, MULTIPLY or REPLACE—the default is REPLACE).
DISCRETE blocks use two types of data records.
The first record of each realization is distinguished by the code BL; the
first name field gives a name to the block, and the second name field pinpoints
the period in which the realizations become known; finally, the first numeric
field contains the probability with which this particular set of values is
attained. Following this marker, the values are listed on separate data records.
The first name field gives the column name, the second name field gives the
row name, and the first numeric field gives the value itself. The other fields
are not used.
Example
*23456789
123456789 123456789 123456789
BLOCKS DISCRETE
BL BLOCK1 PERIOD2
0.5
COL1 ROW6
83.0
COL2 ROW8
1.2
BL BLOCK1 PERIOD2
0.2
COL2 ROW8
1.3
BL BLOCK1 PERIOD2
0.2
COL1 ROW6
84.0
BL BLOCK1 PERIOD2
0.1
COL1 ROW6
84.0
COL2 ROW8
0.0
To save space, only values that change must be recorded. The SMPS format
adopts the convention that the first statement of the block is the basis from
which all changes are computed. (This implies that zero values in subsequent
realizations must be stated explicitly,
as shown in the last data record of the example above.)
In this example the block, called BLOCK1, is the 2-vector made up of the
entries COL1/ROW6 and COL2/ROW8. It takes values (83.0,1.2)
with probability 0.5, (83.0,1.3) with probability 0.2, (84.0,1.2) with probability 0.2, and (84.0,0.0) with probability
0.1.
It is possible to model additive or multiplicative perturbations, but
the user must be careful since each block modifies the core file values and
at the same time replaces values in the first occurrence of the block. If
the core file records values of 83.0 and 1.2 for the locations
COL1/ROW6 and COL2/ROW8, respectively, then the following specification is
equivalent to the previous example:
*23456789
123456789 123456789 123456789
123456789
BLOCKS DISCRETE ADD
BL BLOCK1 PERIOD2
0.5
COL1 ROW6
0.0
COL2 ROW8
0.0
BL BLOCK1 PERIOD2
0.2
COL2 ROW8
0.1
BL BLOCK1 PERIOD2
0.2
COL1 ROW6
1.0
BL BLOCK1 PERIOD2
0.1
COL1 ROW6
1.0
COL2 ROW8
-1.2
Multivariate normal distributions can be set up
using the MVNORMAL keyword on the header record. The first data record contains
a BL in the code field and has the same appearance as the first data record
for DISCRETE blocks. Following this is a list of locations
that make up this block. Each location is identified by two names in the first
two name fields, which leaves two numeric fields and a third name field free.
We place the mean value of each marginal distribution in the first numeric
field and the variance in the second. An alias
is placed in the third name field so that future occurrences of this random
variable can be uniquely deduced from a single name field.
It remains to specify the correlations between pairs of random variables.
This is effected in a separate subsection whose start
is marked with the code CR in the code field. On each data record, the first
two name fields list the aliases of two locations and the first numeric field
contains the correlation between the two random variables.
Alternatively the user may want to specify the covariances
between the components of the random vector. This is easily set up using the
code CV.
Example:
BLOCKS MVNORMAL
BL BLOCK1 PERIOD2
COL1 ROW6
5.0 ALIAS1 1.0
COL2 ROW8
8.0 ALIAS2 2.0
COL3 ROW9
11.0 ALIAS3 3.0
CR
ALIAS1 ALIAS2
0.6
ALIAS1 ALIAS3
0.6
ALIAS2 ALIAS3
0.6
This example sets up a trivariate normal distribution
of the three locations COL1/ROW6, COL2/ROW8, COL3/ROW9, with mean vector (5,8,11),
and variances 1, 2, and 3, respectively. The three components are equicorrelated
with correlation coefficient 0.6.
Random vectors can also be generated by a user-defined
subroutine. This is indicated by a modifier value that is not part of the
existing standard, that is, not equal to DISCRETE, UNIFORM, NORMAL, GAMMA,
BETA, LOGNORM, MVNORMAL, LINTRAN, ARMA, similar to the INDEP section.
The BL record once again marks the beginning of the block, giving a name
and fixing the stage for the block. On subsequent data records the locations
making up the random vector are listed in the first two name fields. Parameters
that must be passed to the subroutine can be specified following a record
containing the code PR, with all the conventions and possibilities discussed
in the INDEP section.
Example
BLOCKS USER1
BL BLOCK1 PERIOD2
COL1 ROW6
COL2 ROW8
COL3 ROW9
PR
12.0
3.0
RHS ROW1
'PRIMAL' COL5
The user-defined subroutine USER1 returns values for the locations COL1/ROW6,
COL2/ROW8, COL3/ROW9. This routine takes four parameters, whose values are
given on the records following the marker PR. The third parameter is the value
of the right-hand side of ROW1, the fourth parameter
is the current value of the decision variable COL5.
Sometimes the user may want to specify stochastic
coefficients determined by underlying random phenomena (factors) of lower
dimension. A simple form of this relationship is an affine transformation,
which can be written mathematically as v = H u + c, where H and c are deterministic
and u is a random vector. Even though this device may blur the boundary
between modelling and recording of an instance in ways not intended by the
original MPS
format, it is nonetheless useful because in many cases the dimension of
the vector u is much smaller than
that of the vector v. Certain approximation
algorithms may be able to take advantage of working in the original space,
so that there is considerable merit in associating the original random factors
u with the problem instance.
The use of this format is signalled by the keyword LINTR on the BLOCKS
header record. Once more the start of each block is marked with a record containing
the code BL in the code field, followed by a name for the block in the first
name field. The second name field identifies the period in which the information
becomes known. Following this header record, the locations of the elements
of the random vector v are identified
on separate data records using the first two name fields; the additive constant
(component of the vector c) can
be entered in the first numeric field.
Then the coefficients of the transformation matrix H are given column by column. Each column is tied to one component
of the vector u, and there are two
mechanisms for supplying its distribution. Any components of u that is independent of the other components
can be specified directly on a record with the code RV. The identifier of
the distribution (DISCRETE, UNIFORM, etc.) is
placed in the second name field,
the distribution parameters are placed in the two numeric fields. The first name field is not used. Alternatively, the name of a random variable
defined in a preceding DISTRIB section may be placed
into the first name field.
Which of these two mechanisms is used depends on the appearance of the
RV data record. If the second name field is empty, the random variable named
in the first name field must have been defined in a preceding DISTRIB
section; if the second name field is not empty, its value must describe
a valid two-parameter distribution family, which
is then taken to describe the distribution of the current random variable.
In this case the first name field is ignored (but must be present for correct
parsing of free-format input).
This generic reference to a distribution is normally sufficient,
since appropriate copies can be made of the random variable whenever it is
referenced. However, if a random variable is lagged (e.g., to be used in subsequent
ARMA sections), a specific name (that is, a unique
identifier) may be given in the third name field.
Correct parsing of free-format input mandates a placeholder
'LAGGED' to be put in the second name field.
We will illustrate these rules with the following example:
*23456789
123456789 123456789 123456789
123456789 123456789
DISTRIB MVNORMAL
BL BL1
U1 0.0 1.0
U2 0.0 1.0
CV
U1 U2
0.5
BLOCKS LINTR
BL BLOCK1 PERIOD1
COL1 ROW1
5.0
COL2 ROW2
10.0
COL3 ROW3
0.0
COL4 ROW4
0.0
*23412345678 12345678 123456789012
12345678 123456789012
RV U1 'LAGGED'
U11
COL1 ROW1
-1.0
COL2 ROW2
2.0
COL3 ROW3
1.0
RV U2 'LAGGED'
U21
COL1 ROW1
1.0
COL2 ROW2
2.0
COL3 ROW3 3.0
COL4 ROW4
4.0
RV U3
COL1 ROW1
1.0
This example defines a joint probability distribution for the four random
elements COL1/ROW1, COL2/ROW2, COL3/ROW3, COL4/ROW4 making up the random vector
v, which depends on a three-dimensional
random vector u as follows: The
first two components of u are defined
in a DISTRIB section as bivariate normal with mean (0,0)' and covariance matrix
.
The third component is defined directly in the block itself and is normal
with mean 10 and variance 3. It is independent of the other components, so
that u will have a trivariate
normal distribution with mean (0,0,10) T
and covariance matrix
.
The affine transformation is given by 
By the rules governing linear transformations of normal random variables,
v=Hu+c
is therefore also normally distributed with mean vector mv = H
mu + c = (15, 10, 0, 0)T
and covariance matrix
.
More general linear dependencies can be established as well. Processes of the form
,
where the random variables uj are independent from stage to stage and
Mi and Nj
are fixed matrices of appropriate dimensions, are called ARMA(m,n) processes and can be defined using the
following conventions:
The following example may serve to illustrate these rules. It is a continuation
of the previous example.
BLOCKS
BL BLOCK2 STAGE2
COL5 ROW5
0.0
COL6 ROW6
0.0
RV U1 'LAGGED'
U12
COL5 ROW5
5.0
RV U2 'LAGGED'
U22
COL6 ROW6
3.0
RV U3 'LAGGED'
U32
COL5 ROW5
1.0
COL6 ROW6
2.0
HV COL3 ROW3
COL5 ROW5
1.0
HV COL4 ROW4
COL6 ROW6
1.0
COL5 ROW5
1.0
This example sets up the ARMA(1,1) process yt = M1yt-1
+ N0ut + N1ut-1
with
![]()
DISTRIB section
This section describes the use of random variables (univariate
and multivariate) that do not correspond directly to stochastic elements within
the current problem. This device is needed to adequately model certain multivariate
distributions, since the MPS data record does not allow sufficient fields
to specify correlations between random elements. The main purpose of a DISTRIB
section is to provide an alias (a
single name) by which the random variable can be referenced in a later BLOCKS section.
There may be many DISTRIB sections; it is assumed that each of them defines
random variables independently of the others. Conversely, each new group of
random variables must be introduced by a new DISTRIB header.
The rules for the DISTRIB sections are similar to those for other distributions
(INDEP, BLOCKS) except
that only one name field (the first) is used to provide the name for the random
variable. Discrete random variables can be defined, as can two-parameter families and the multivariate
normal distribution. In the latter case, distinct names must be used for
the distribution and for each of its components. The user can even specify
univariate or multivariate random variables by accessing a
custom-made subroutine.
Each entry thus defined establishes a generic distribution which can be referenced more than once in subsequent
BLOCKS.
Example
*23456789
123456789 123456789 123456789
123456789 123456789
DISTRIB UNIFORM
U1 0.0 1.0
DISTRIB MVNORMAL
BL BL1
U2 0.0 1.0
U3 0.0 1.0
CV
U2 U3
0.5
This example defines the auxiliary random variable U1, which is uniformly
distributed on [0,1], and a bivariate normal random vector whose components have mean
0, variance 1 and covariance 0.5. The vector can be referred to as BL1, its
first component as U2 and its second component as U3.
Most constraints in stochastic programming are of the “replicating” kind. This means that each constraint contains variables from at most one node in each period. The constraint is thought to belong to the latest period in which it has variables and is replicated for each realization of the random elements.
In some special instances (for example, to model risk constraints in financial applications) it may be useful to link together variables belonging to different nodes in the same period. The easiest way to include such linking or global constraints in a stochastic program is to assign the constraint to the first period. (This is best done using the explicit form of the time file.) While the more common decomposition solvers will not be able to deal with linking constraints, specially developed software (see e.g., Steinbach 2003) is available.
The following is an example of a linking constraint:
Core file entry:
*23456789 123456789 123456789
123456789
ROWS
G GLOBCON
...
COLS
...
FWEALTH
GLOBCON 1.20
...
RHS
RHS1
GLOBCON 100.
Time file entry:
*23456789 123456789 123456789
123456789 123456789
PERIODS EXPLICIT
Period_1
Period_2
…
ROWS
...
GLOBCON
Period_1
COLS
...
FWEALTH
Period_2
Stoch file entry:
*23456789 123456789 123456789
123456789 123456789
SCENARIOS
SC Scen_1 'ROOT'
0.4 Period_1
SC Scen_2 Scen_1
0.6 Period_2
FWEALTH
GLOBCON 1.30
...
This construction sets up the global constraint
1.20*FWEALTH(Scen_1) + 1.30*FWEALTH(Scen_2) ³100.
To make sure that the many variations can all be parsed correctly it is
important to read the three files in the correct order, as follows:
Time file: PERIODS section only
Core file
Time file: Remaining sections (if any)
Stoch file
Solver capabilities and error recovery
The SMPS format is extremely flexible and can define a wide variety of
problems, far exceeding the capability of any existing stochastic programming
solver. It should not be assumed, therefore, that a given problem can be solved
with a particular solver nor that in fact there is any solver capable of solving the problem. However, any SMPS parser
should recover gracefully when it encounters a section or construct its solver
is not prepared to deal with.
[1]. J.R. Birge, M.A.H. Dempster, H.I. Gassmann, E.A. Gunn, A.J. King and S.W. Wallace, “A standard
input format for multiperiod stochastic linear programs”,
COAL Newsletter #17 (1987) pp. 1–19.
[2]. J.R. Birge and
F. Louveaux, Introduction
to Stochastic Programming, Springer Series in Operations Research, Springer
Verlag,
[3]. A. Charnes and
W.W. Cooper, “Chance-constrained programming”, Management Science 5 (1959)
73–79.
[4]. J. Edwards, J. Birge,
A. King and L. Nazareth, “A standard input format for computer codes which
solve stochastic programs with recourse and a library of utilities to simplify
its use”, Working Paper WP-85-03, International Institute for Applied Systems
Analysis, Laxenburg, Austria, 1985.
[5]. H.I. Gassmann and E.
Schweitzer, “A comprehensive input format for stochastic linear programs”,
Annals of Operations Research 104
(2001) 89-125.
[6].
International Business Machines, Inc., "Passing your model to OSL using
Mathematical Programming System (MPS) format", world-wide web document,
http://www-306.ibm.com/software/data/bi/osl/pubs/Library/featur11.htm.
[7]. P. Kall
and S.W. Wallace, Stochastic Programming,
John Wiley and Sons,
[8]. A.J. King, “An implementation of the Lagrangian finite-generation method”, in: Yu. Ermoliev and R.J-B Wets (eds.), Numerical Techniques for Stochastic Optimization, Springer Series
in Computational Mathematics, Vol. 10, Springer Verlag,
Berlin, 1988.
[9]. W.K. Klein Haneveld,
Duality in Stochastic Linear and Dynamic
Programming, Lecture Notes in Economics and Mathematical Systems, Vol.
274, Springer-Verlag, Berlin, 1986.
[10]. D. Klingman,
A. Napier and J. Stutz, “NETGEN: A program for generating large scale capacitated
assignment, transportation, and minimum cost flow network problems”, Management
Science 20 (1974) 814–821.
[11]. J.M. Mulvey,
R.J. Vanderbei and S.A. Zenios,
“Robust optimization of large-scale systems”, Operations Research 43
(1993) 264–281.
[12]. A. Prékopa,
Stochastic Programming, Kluwer Academic Publishers,
[13]. M.C. Steinbach, "Tree-sparse convex
programs", Mathematical Methods of Operations Research 56
(2003) 347-376.
Last modified
11-Jul-2006
.
Please send comments and errata to the author.