The SMPS format for stochastic linear programs

 

Last modified 06 October 2005.

 

H.I. Gassmann,

School of Business Administration

Dalhousie University

Halifax, Canada B3H 3J5

 

 

This site explains the complete SMPS format used to describe stochastic linear programs. The format is based on the well-known MPS format for deterministic linear programs and derives from previous work by Edwards et al., Birge et al., and Gassmann and Schweitzer.

 

Examples of stochastic programs are given elsewhere. (See, for instance, the books by Birge and Louveaux and Kall and Wallace.)

 

 

Please send comments and feedback to the author.


Contents

 

Design philosophy

Core file

Core file sections: NAME, ROWS, COLUMNS, RHS, RANGES, BOUNDS

Fine points: MPS record format, Network format, Mixed LP and networks, Integer variables, Quadratic objectives, Probabilistic objectives, Quantile objectives

Time file

Time file sections: TIME, PERIODS, ROWS, COLUMNS

Fine points: Implicitly named columns

Stoch file

Stoch file sections: STOCH, SIMPLE, ROBUST, PLINQUAD, CHANCE, ICC, SCENARIOS, NODES, INDEP, BLOCKS, DISTRIB

Fine points: Linking constraints

Order of processing

Solver capabilities and error recovery

References

 


Design philosophy

 

The SMPS format was designed to make it easy to convert existing deterministic linear programs into stochastic ones by adding information about the dynamic and stochastic structure. This is done using three text files, the core file, time file and stochastics file (or stoch file, for short). Each file consists of several sections, which must appear in predefined order. The first record of each section is a header record, which is followed by zero or more data records. Empty sections (containing no data) may be omitted. The records in each file are organized into fields according to the MPS record layout.

 


Core file

 

The core file lists all the deterministic information of the problem, in either MPS format for linear programs or in NETGEN format for problems in network format. Core file information includes the name and type of each constraint, the names of rows and columns, a column-ordered list of coefficients in the constraint matrix, right-hand side coefficients and any bounds and ranges on the variables and slacks. The core file also provides placeholders for all the stochastic elements (which must be mentioned in the core file and given a preliminary value that may or may not be meaningful). The core file thus represents a deterministic problem, which may be thought of as a typical scenario, an average case, a baseline case or similar.

 

Sections in the core file

 

Section

Purpose

NAME

Gives a name to the problem in name field 2

ROWS

Gives the name and type of each constraint and objective

COLS

Gives the nonzero coefficients in column order

RHS

For non-zero entries of the right-hand side vector (optional)

RANGES

For range constraints (optional)

BOUNDS

For bounds on decision variables. Different codes identify different types of bounds (optional)

ENDATA

End of problem data

 

The small print:

 

Integer variables

Network format and mixed LP-network

Quadratic objectives

Probabilistic objectives

Quantile objectives

 


NAME section

 

The name of the problem is given in the second name field. This name is verified against the name in the time and stoch files.

 

 

Example

 

*23456789 123456789 123456789

NAME          Example

 


ROWS Section

 

The type of each row is given in the code field (‘L’ indicates that the constraint is of the form , ‘G’ is for constraints of the form , ‘E’ is for equality constraints, ‘N’ indicates a free row (no constraint) or potential objective row. By default the first ‘N’-type row is used as the objective. Each row is given a name in the first name field.

 

 

Example

 

*23456789 123456789 123456789

ROWS

 N  OBJECT

 G  ROW2

 


COLUMNS section

 

The nonzero coefficients of the constraint matrix are given in column order. The code field is empty. The first name field contains the name of the column, the second name field must match the name of a row, and the first numeric field gives the value of the corresponding coefficient in the constraint matrix. A second coefficient may appear on the same record using the third name field for the name of another row and the second numeric field for the corresponding coefficient.

 

 

Example

 

*23456789 123456789 123456789 123456789 123456789 123456789

COLS

    COL1      ROW2      2.0            OBJECT    100.

 


RHS section

 

The first name field on a data record contains an identifying name (that must match information given externally), the second name field contains the name of a constraint mentioned in the ROWS section. The first numeric field contains the value of the right-hand side in this constraint. The third name field and second numeric field may contain the information corresponding to one further constraint. Only nonzero right-hand side values require an entry in this section, and the order does not matter.

 

 

Example

 

*23456789 123456789 123456789

RHS

    RHS1      ROW2      50.

 


RANGES section

 

The RANGES section allows the user to set up constraints that act as both L and G-type constraints. The first name field on a data record contains an identifying name (that must match information given externally), the second name field contains the name of a constraint mentioned in the ROWS section. The first numeric field contains a range for the value of the right-hand side in this constraint. The third name field and second numeric field may contain the information corresponding to one further constraint. The order of constraints does not matter.

 

Assume that the right-hand side value for the row is bi and the entry in the ranges section is ri. Then the action of the range command is as follows:

 

Row type

 

Effect

 

 

G

 

L

 

E

if ri > 0

E

if ri < 0

 

 

Example

 

*23456789 123456789 123456789

RANGES

    RANGE1    ROW2      10.

 

If ROW2 is a G-type row with previously defined right-hand side 50, then this declaration defines a range of [50,60].

 


BOUNDS section

 

The first name field on a data record contains an identifying name (that must match information given externally), the second name field contains the name of a variable mentioned in the COLUMNS section. The first numeric field contains a value associated with this bound. The type of bound is clarified in the code field. The third name field and second numeric field are not used. The order of the columns does not matter.

 

Code

Interpretation

LO

Lower bound

UP

Upper bound

FX

Fixed value

FR

Free variable

MI

Nonpositive variable. (Lower bound is )

PL

Nonnegative variable (Upper bound is )

BV

Binary variable (0-1 variable)

UI

Upper bound. If the variable was declared as integer in the COLUMNS section, the upper bound is rounded down to the nearest integer.

 

If the code field contains the values FX, MI, PL or BV, the value in the first numeric field is ignored. Default bounds are 0 and  for continuous variables and 0 and 1 for integer variables.

 

 

Example

 

*23456789 123456789 123456789

BOUNDS

 UP BOUND1    COL1      20.

 

This example defines an explicit upper bound of 20 for decision variable COL1.

 


MPS record layout

 

One must distinguish between header records and data records.

 

Header records contain up to three name fields, normally in these positions:

 

columns 1–14

first word field

columns 15–24

second word field

columns 40–49

third word field

 

 

Data records contain up to three name fields, two numeric fields and a code field, normally in these positions:

 

columns 2 and 3

code field

columns 5–12

first name field

columns 15–22

second name field

columns 25–36

first numeric field

columns 40–47

third name field

columns 50–61

second numeric field

 

 

Names are normally restricted to eight characters and may contain any ASCII symbol; numerical input is limited to twelve characters (including digits, signs, exponents and a decimal point). Free format input can be used if these limits are too restrictive. However, in order to allow for correct parsing of data records, some fairly non-restrictive rules must be followed:

 

1.        Each field is surrounded by white space (one or more blanks) and does not contain an embedded blank.:

2.        The first name field of a header record must start in column 1.

3.        Names and numeric information on data records start in column 5 (or further to the right). The first four columns must be reserved for the code field (which is to be placed into columns 2 and 3).

4.        Numbers (numerical strings) must start with a digit (or a decimal point immediately followed by a digit). Exponential notation (e.g., 1.E–5 or .3e+6) is allowed.

5.        Names (character strings) have no embedded blanks and must start with a non-numeric character. (‘plus2’, ‘an_extremely_long_and_tedious_name’, ‘.cost’ and ‘$1000’ are all valid names, but ‘sp ace’, ‘2B’ and ‘.5off’ are not.)

 


Network format

 

The format used for network sections is modified somewhat from the proposal by Klingman et al. As in the MPS format there are header records and data records. The data records are organized in the following manner:

 

 

Data record for networks

 

columns 2–4

code field

columns 7–12

first name field (from-node)

columns 13–18

second name field  (to-node)

columns 21–30

first numeric field  (unit flow cost)

columns 31–40

second numeric field (optional: maximum flow)

columns 41–50

third numeric field (optional: minimum flow)

columns 51–60

fourth numeric field (optional: multiplier)

columns 63–70

third name field (optional: see below)

 

 

Note that in this format the name fields are only six characters wide and the numeric fields hold ten characters each.

 

The following modifications to the NETGEN format are made in SMPS: For consistency, the BEGIN record is changed to a NAME record, and the END record is replaced by ENDATA. More importantly, since arcs are named only implicitly, it is important that each node is associated unequivocally with its time period. This requires a separate NODES section, which follows the same rules as the ROWS section. The two sections can be used interchangeably.

 

Even this is not enough to ensure that implicitly named arcs are assigned to the correct time period. The convention is that an arc belongs to the earlier of the periods containing the origin and destination nodes, unless otherwise specified. This convention can be overridden by placing the name of the period to which the arc belongs into the third name field.

 

 

Core file sections for networks

 

Section

Purpose

NAME

Gives a name to the problem in name field 2

NODES

Gives the name of each node

ARCS

Specifies the cost, flow limits and multiplier of each arc

SUPPLY

Gives supply at each supply node

DEMAND

Gives demand at each demand node

ENDATA

End of problem data

 

 

Example

 

NAME      Network format

ARCS

    NODE1 NODE2 cost      upper     lower     multiplier

  1 NODE1 NODE3 cost      upper     lower     multiplier

  2 NODE1 NODE3 cost      upper     lower     multiplier

    ...

    NODEk NODEn cost      upper     lower     multiplier

SUPPLY

    ...

    NODE1  amount

DEMAND

    ...

ENDATA

 

Note the two parallel arcs connecting node1 and node3.

 


Mixed LP and network problems

 

Network and LP sections can be mixed. It might be useful, for example to set up a two-stage problem in which the first stage is an LP model concerned with facility location and capacity planning, while the second stage is a network flow problem that allocates random demands once the facilities have been opened. Another example is a network with side constraints, which can easily arise in financial applications.

 

Mixed LP and network problems use all the sections described previously under LP and network problems. The following sections serve the same purpose and may be alternated freely:

 

ROWS and NODES sections,

COLUMNS and ARCS sections,

SUPPLY, DEMAND and RHS sections.

 

The general order of sections of mixed LP and network problems is as follows:

 

Section

Purpose

NAME

Gives a name to the problem in name field 2

ROWS

Give the name of each row/node.

NODES

COLUMNS

Give the name of each variable/arc. ARCS sections also specify cost coefficients and variable bounds.

ARCS

RHS

 

Give right-hand side values for the constraints

SUPPLY

DEMAND

RANGES

(optional)

BOUNDS

(optional)

ENDATA

End of problem data

 

The remaining challenge is to specify constraint coefficients to the various constraints. In mixed LP and networks problems, one has to be prepared to deal with four distinct situations:

 

LP columns and constraint coefficients in LP rows

This is easily handled with an MPS-style data record

LP columns that bring flow to a network node.

The only difficulty here is that nodes are not named in a separate ROWS section

Network arcs in node constraints

This is easily handled in a NETGEN-style data record.

Network arcs in LP constraints

This is the most difficult situation and requires careful attention.

 

The challenge with network arcs appearing in LP constraints is that arcs are not named explicitly, so that three name fields are needed (two for the two nodes incident to the arc and a third for the constraint row), whereas the original NETGEN standard allows only two.

 

The SMPS format extends the NETGEN data record by allowing a third name field to be placed in columns 63–70. If not empty, the third name field refers to an LP row name, and the first numeric field is taken as the coefficient value. There should be no further numerical data on such records. If the third name field is empty, the record is a regular NETGEN data record.

 

 

Example

 

Time file:

 

*23456789 123456789 123456789 123456789 123456789 123456789

TIME          Mixed LP and network example

PERIODS

    COL1      ROW1                     PERIOD1

    COL6      ROW3                     PERIOD2

    'NETWRK'  NODE1                    PERIOD3

    'NETWRK'  NODE3                    PERIOD4

    COL18     ROW5                     PERIOD5

ENDATA

 

 

Core file

 

*23456789 123456789 123456789 123456789 123456789 123456789

NAME          Mixed LP and network example

ROWS

    ...

  E ROW3

  E ROW4

  L NODE1

  G NODE2

  E ROW5

  E ROW6

    ...

COLUMNS

    ...

    COL16     ROW3     value           NODE1     value

    COL17     ROW4     value           NODE2     value

ARCS

    NODE1 NODE2 cost      upper     lower     multiplier

    NODE1 NODE7 cost      upper     lower     multiplier

    ...

    NODE8 ROW5  cost      upper     lower     multiplier

    NODE8 ROW6  cost      upper     lower     multiplier

COLUMNS

    COL18     ROW5      value          ROW7      value

    COL19     ROW6      value          ROW8      value

    ...

RHS

    ...

SUPPLY

    ...

ENDATA

 

The first COLUMNS section lists the deterministic coefficients of the constraint matrices belonging to periods 1 and 2. Two entries of this last matrix are illustrated in the last two data records in this section. These entries will take values from the second to the third stage. The amount will be determined by the values of COL16 and COL17 and the corresponding entries in this COLUMNS section.

 

The last two entries in the ARCS section will bring “flow” from NODE8 to the right hand side of ROW5 and ROW6 by entering a number in those rows. The number will be the “multiplier” from the input, and the value ending up on the right hand side will be the product of this multiplier and the flow running out of NODE8 to these rows (which are nodes when viewed from the network).

 


Integer variables

 

Integer variables can be handled in the same way as in MPS: Special data records called indicator records are inserted before the start of a group of integer variables and again at the end. These data records define the type of integer variables used through special markers in the name fields.

 

For general integer variables, the code field of the starting indicator record should be empty, the second name field should contain the marker 'MARKER', and the third name field should contain the marker 'INTORG'. The first name field may contain any value that differs from the name of the previous column. The ending indicator record is identical to the opening indicator, except that the third name field contains the marker 'INTEND'.

 

Example

 

*23456789 123456789 123456789 123456789 123456789 123456789 123456789

    INT       'MARKER'                  'INTORG'

    Col1      Row1       1.0            Row2              -1.0

    Col2      Row1      -1.0            Row3               1.0

    INT       'MARKER'                  'INTEND'

 

This example sets up two integer columns, Col1 and Col2.

 

Special ordered sets can be used to further restrict the possible values of a set of integer variables. At most one of the variables in a special ordered set of type 1 may take a nonzero value. In a special ordered set of type 2 at most two variables may take nonzero values, and they must be adjacent within the set. Special ordered sets of type 3 are used to model decisions, as they force all of the integer variables in the set to be 0-1 variables.

 

Special ordered sets of type 1 are denoted by the following indicator records: The starting indicator record contains a code value of S1, an arbitrary name in the first name field, the keyword 'MARKER' in the second name field and the keyword 'SOSORG' in the third name field. The ending indicator record is similar, but it contains the keyword 'SOSEND' in the third name field.

 

*23456789 123456789 123456789 123456789 123456789 123456789 123456789

 S1 INT1      'MARKER'                  'SOSORG'

    Col1      Row1       1.0            Row2              -1.0

    Col2      Row1      -1.0            Row3               1.0

 S1 INT1      'MARKER'                  'SOSEND'

 

In this example the two integer columns Col1 and Col2 form a special ordered set of type 1.

 

Special ordered sets of type 3

 

Special ordered sets of type 3 are useful in the following situation:

 

1.        There exists an equality constraint whose right-hand side is 1.

2.        The coefficients of all variables in this row are either 0 or 1.

3.        The variables whose coefficients are non-zero are integer variables.

4.        At most one of these integer variables can be non-zero.

 

In this case it can be deduced that exactly one of the integer variables takes a value of 1, and that all of them are 0-1 variables. These variables form a special ordered set of type 3. Special ordered sets of type 3 are declared in SMPS by a construct similar to other special ordered sets. The starting indicator record has an empty code field and the name of a valid equality row in the first name field. The second name field contains the keyword 'MARKER' and the third name field contains the keyword 'SOSORG'. In the ending indicator record, the keyword 'SOSEND' replaces the keyword 'SOSORG'.

 

Example:

 

*23456789 123456789 123456789 123456789 123456789 123456789

    Row3      'MARKER'                 'SOSORG'

    Col1      Row1      1.0            Row3               1.0

    Col2      Row1      1.0            Row3               1.0

    Row3      'MARKER'                 'SOSEND'

 


Quadratic objectives

 

A quadratic objective can be set up following the linear part of the core file. It uses the following sections:

 

Section

Purpose

NAME

Gives a name to the problem in name field 2

QSECTION

Gives the nonzero coefficients in the quadratic objective

ENDATA

End of problem data

 

The NAME section gives a name in the second name field. This name must match the name given in the linear portion of the core file. The second name field in the QSECTION header record contains no information. The first two name fields on a data record contain the column names of a quadratic term, the first name field contains the coefficient. A common factor of ½ is assumed throughout.

 

The QSECTION can be read from a separate file if necessary.

 

Example

 

*23456789 123456789 123456789 123456789

NAME          Example

QSECTION

    Col1      Col1          1.0

    Col1      Col2          3.0

    Col2      Col2          2.0

ENDATA

 

sets up the objective 0.5*(Col1)2 + 1.5*Col1*Col2 + (Col2)2.

 


Probabilistic objectives

 

Probabilistic objectives take the form

 

 

where the optimization could be either minimization or maximization and the relation ~ could be ³, £ or =.

 

This is very similar to a single chance constraint, and thus it can be represented in SMPS format in the same way. If ROW1 is a constraint row specified in the core file, then the following example defines a probabilistic objective:

 

*23456789 123456789

CHANCE        INDIV

  N CC1       ROW1

 

The numeric fields are not used.

 

Probabilistic objectives involving multiple (joint) constraints can be set up in the same manner:

 

*23456789 123456789 123456789 123456789

CHANCE

 JN CC1       CON1

    ROW2

    ROW3

 

 


Quantile objectives

 

Quantile objectives take the form

 

.

 

This is the same as optimizing c subject to the simple constraint  along with the other constraints of the problem.

 


The time file

 

Purpose of the time file is to allow breaking down the core file scenario into nodes corresponding to the individual stages.

 

Sections in the time file

 

Section

Purpose

TIME

Gives a name to the problem in name field 2

PERIODS

Gives the name of each time period (in order)

ROWS

(optional) Specifies for each row the period to which it belongs

COLUMNS

(optional) Specifies for each column the period to which it belongs

ENDATA

End of problem data

 

The small print

 

Implicitly named columns

Stages

 


TIME section

 

The second name field on the TIME header record contains the name of the problem. This name is verified against the name in the core and stoch files.

 

 

Example

 

*23456789 123456789 123456789

TIME          Example

 


PERIODS section

 

The header record contains the value PERIODS in the first name field. If the core file is in time order, then the second name field can be left blank or given the value IMPLICIT. The ROWS and COLUMNS sections are not needed in this case.

 

If the core file is not given in temporal order (for instance if it was generated by an algebraic modelling language), then the keyword EXPLICIT must be placed in the second name field of the header record and the ROWS and COLUMNS sections must be present.

 

Each data record contains the name of a period in the third name field. If the time file is in implicit format, then the first name field contains the name of the first column in this period, the second name field contains the name of the first row. If the time file is in explicit format, the first two name fields are not used. The objective row must always belong to the first period.

 

Example

 

*23456789 123456789 123456789 123456789 123456789 123456789

PERIODS       IMPLICIT

    COL1      ROW1                     PER1

    COL5      ROW3                     PER2

 

In this example all columns in the core file that appear between COL1 (inclusive) and COL5 (exclusive) make up the first period, named PER1, while COL5 and all the remaining columns belong to period PER2. Similarly, ROW1 and ROW2 (along with the objective) are the rows belonging to the first period; all the others are second-period rows.

 


ROWS section

 

The data records in this section contain the name of a row mentioned in the core file in the first name field; the second name field contains the name of a time period set up in the preceding PERIODS section. The ROWS section is optional if the core file is given in temporal order.

 

Example

 

*23456789 123456789 123456789

ROWS

    ROW1      PER1

    ROW3      PER2

    ROW2      PER1

    ...

 


COLUMNS section

 

The data records in this section contain the name of a column mentioned in the core file in the first name field; the second name field contains the name of a time period set up in the preceding PERIODS section. The COLUMNS section is optional if the core file is given in temporal order.

 

Example

 

*23456789 123456789 123456789

COLS

    COL1      PER1

    COL5      PER2

    COL2      PER1

    ...

 


Implicitly named columns

 

If the core file is given in temporal order and the time file only provides markers for the first row and column of each period, then there exists a problem with columns that do not have explicit names given in the core file. This situation can arise in two circumstances: if the core file has network components, or if there are penalties (linear, quadratic or mixed) for violating a constraint. In the first instance, the first name field of a data record in the PERIODS section should contain the marker 'NETWRK', in the second instance it should contain the marker 'PENLTY'. For an example see the section on mixing LP and network formats.

 


Stages

 

Stages help many stochastic programming solvers to decompose the problem into smaller pieces. They denote the times when new information becomes available. The stages form a subset of the time structure and are characterized by an alternating sequence of decision points and random events. It is assumed that at the time of the first decision all explicit constraints are known, so that the sequence always starts with decision - event - decision - event .... The end of the sequence for recourse problems is ...decision - event - decision, while most problems with chance constraints contain only the sequence decision - event. (Multistage problems with probabilistic constraints have been investigated by Prékopa.)

 

The usual convention for problems with probabilistic constraints is to consider a stage as (decision – event), while for recourse problems it is more common to use the opposite view. However, to allow mixing of recourse and chance constraints, consistency is essential. This format uses the convention that a stage is made up of an event and a decision (in that order), with the very last event possibly irrelevant.

 

This has the advantage that the events and the induced decisions lie in the same stage, simplifying the indexing notation for finite distributions and making it easier to set up problems with incomplete information (where some of the first stage coefficients are unknown). The only drawback of this method is that ordinary chance-constrained problems must be thought of as two-stage problems with an unusual second stage that contains no decisions.

 

Assignment of problem elements to the various stages must then be consistent with the following rules:

 

1. A column belongs to the stage in which the decision is made.

2. A row belongs to the stage of the latest decision incident with the row.

3. A random variable belongs to the earliest stage that has a stochastic element depending on this random variable.

4. For finite distributions all columns (and with them the rows) must be replicated according to the random variables of the same stage. In particular, this means that for chance-constrained problems (where the final stage contains no decisions) replication is not needed in the final stage.

 


Stoch file

 

The purpose of the stoch file is to allow the solver to build a deterministic equivalent, which requires information about the random variables. If all the random variables are finitely distributed, it amounts to the construction of an event tree. The event tree can be described in three different ways: scenario by scenario, node by node, or implicitly using marginal distributions. Implicit descriptions are also available if the random variables are continuously distributed, where explicit descriptions would obviously fail.

 

The stoch file may contain the following sections:

 

Section

Purpose

STOCH

Gives a name to the problem in name field 2

SIMPLE

For problems with simple recourse (optional)

ROBUST

For quadratic penalties (optional)

PLINQUAD

For piecewise linear-quadratic penalties (optional)

CHANCE

For chance constraints (optional)

ICC

For integrated chance constraints (optional)

SCENARIOS

Gives the values of stochastic elements one scenario at a time

NODES

Gives the values of stochastic elements one node at a time

INDEP

Marginal distribution of independent random variables

BLOCKS

Marginal distribution of a random vector

DISTRIB

Distribution of auxiliary random variables

ENDATA

End of problem data

 

The three ways of describing the event tree are mutually exclusive. That is, a stoch file cannot contain SCENARIO, NODE and INDEP/BLOCK/DISTRIB sections simultaneously. INDEP, BLOCKS and DISTRIB sections may be mixed freely, although the order must be such that enough information is available to construct the event tree during a single pass through the file.

 

The small print

 

Linking (global) constraints

 


STOCH section

 

The second name field on the STOCH header record contains the name of the problem. This name is verified against the name in the core and time files.

 

Example

 

*23456789 123456789

STOCH         Example

 


SIMPLE section for simple recourse

 

Simple recourse problems use very special recourse matrices, which should not have to be set up by the user explicitly. In simple recourse problems a subset of the rows is of the form , where the auxiliary variables  and  are nonnegative and are used to balance any shortage () or surplus () allocation of the decision variables after the random variable bi has been observed. Any shortage incurs a penalty of , surplus is penalized by .

 

We set up a separate section named SIMPLE as part of the stoch file. Each data record in this section contains the name of one row in the second name field (the first name field contains an identifier and can be used to set up multiple sets of simple recourse coefficients), and the cost coefficients are placed in the two numeric fields;  is first, followed by . The penalty costs  and  must satisfy the condition  in order for the problem to be bounded; this constraint is enforced by the system.

 

Each occurrence of a row name in this section triggers the generation of additional decision variables, identified by appending the suffix .minus and .plus after the name of the row. If the row represents an equality constraint, both variables are generated, along with their coefficients in the constraint matrix. If the row is of type L, only the shortage variable <rowname>.minus is created; if the row is of type G, only the surplus variable <rowname>.plus is added. The penalty terms in a simple recourse row will most likely be deterministic, but in a pinch the user can specify stochastic coefficients by using these system-generated variable names.

 

The recourse variables may be integer-valued. This fact is signalled by placing integer markers into the stoch file.

 

Example

 

*23456789 123456789 123456789 123456789 123456789 123456789

SIMPLE

    SIMPLE1   ROW1      10.0                     5.0

    SIMPLE1   ROW2      50.0

 

Assume that ROW1 is an E-type row and ROW2 has type G. Then each unit of shortage in ROW1 is penalized by 10.0 units in the objective; each unit of surplus has a penalty of 5. Surplus in ROW2 is penalized by 50 units.

 

In order to allow stochastic recourse costs, the SIMPLE section precedes the NODES, SCENARIOS, INDEP, BLOCKS and DISTRIB sections.

 


ROBUST section for robust optimization (quadratic penalties)

 

Quadratic penalties for surplus and shortage have a similar function as the piecewise linear penalties in the section on simple recourse. The most common application of this feature is in robust programming (see, e.g., Mulvey et al.).

 

We provide a section named ROBUST with syntax identical to the SIMPLE section to provide quadratic penalty coefficients on  and . The ROBUST section follows the SIMPLE section and, like it, precedes the distribution sections. In keeping with the conventions for the quadratic objective, the penalty term includes an implicit factor of 2, that is, if the entry is of the form

 

*23456789 123456789 123456789 123456789 123456789 123456789

    ROBUST1   ROW1      0.5                      1.5

 

then negative deviations from the target x0 are penalized by 0.25*(x–x0)2, positive deviations by 0.75*(x–x0)2.

 

In many applications the quadratic penalty term is uniform, i.e., it is equal to some value r/2 for all constraints. For simplicity this coefficient can be specified directly in the first numeric field of the header record.

 


PLINQUAD section for piecewise linear-quadratic penalties

 

Another type of penalties has received attention in the literature, the piecewise linear-quadratic penalties described by King. These are penalties of the form

.

Because of continuity and differentiability requirements at 0 and b/a, two parameters suffice to describe these one-sided penalties, the curvature a of the quadratic part and the slope b of the linear part. We propose to express these parameters in a PLINQUAD section, to follow the SIMPLE and ROBUST sections. Each data record in the PLINQUAD section lists an identifying name in the first name field, a valid row name in the second, the quadratic parameter in the first numeric field, and the linear parameter in the second. (If the linear parameter is nonpositive, the penalty is assumed to be a pure quadratic.)

 

If the mentioned row is of type L, the penalty is applied to positive deviations, if the row is of type G, the penalty applies to negative deviations. In the case of equality constraints, the range on which to apply the penalty is ambiguous but can be clarified in the code field. If the code field is empty, the penalty is applied to both positive and negative deviations; the code PL denotes positive deviations, the code MI is used to specify negative deviations.

 

Example

 

*23456789 123456789 123456789 123456789 123456789 123456789

PLINQUAD

    PLQ1      ROW3      1.0                      10.0

 

Assume that ROW3 is of type L. Then the first 10 units of shortage are penalized at the progressive rate of 0.5*(short)2, all subsequent units are penalized at a rate of 10.

 


CHANCE section for chance-constrained problems

 

This section describes the modelling of chance-constrained problems. There are individual (one-dimensional) and joint (multidimensional) chance constraints, with slightly differing syntax.

 

For the purpose of this exposition we distinguish four different forms of chance constraints:

 

 

The first two of these are one-dimensional chance constraints, the last two are multidimensional; the symbol ~ indicates an arbitrary constraint type (as set up in the core file), Ak is a matrix, bk a conformal subset of the right-hand side vector b.

 

The CHANCE section deals only with supplying the reliability levels a and fault tolerances e, the constraints themselves are defined in the core file, along with their deterministic components; the distributions of the stochastic elements are described in later sections of the stoch file.

 

Data records for individual chance constraints list the name of the row in the second name field and the reliability level in the first numeric field. The code field can be used to indicate whether we are dealing with a reliability level (Pr{…} ³ ak — using G as the code value) or a fault tolerance (Pr{…} £ ek — using code value L). Other valid codes are N and E. The first of these can be used to set up a probabilistic objective, the latter is here mostly for completeness. For N-type records, the numeric fields are ignored.

 

The first name field is used as a label for this group of constraints. This permits selecting different chance-constraints or perhaps different probability levels, similar to the device of specifying multiple objective rows and/or right-hand side sections in an MPS file.

 

 

Example:

 

*23456789 123456789 123456789 123456789

CHANCE

  G CC1       ROW1      0.95

  L CC1       ROW2      0.10

 

This sets up two probabilistic constraints that should be interpreted as

assuming that ROW1 and ROW2 were both defined as G-type rows in the core file.

 

 

For joint chance constraints we use two types of data records. The first, denoted with a code in the code field, marks the start of the next block, the direction of the probabilistic inequality (JG for reliability levels, JL for fault tolerances), the label for this group of chance constraints in name field 1, a distinguishing name in name field 2 (for the somewhat esoteric situation of a multidimensional chance constraint with a probability level that itself depends on a particular realization of a random variable in the past), and the value of ak in the first numeric field. The following data records list the names of the rows participating in this constraint, as in the following example:

 

*23456789 123456789 123456789 123456789

CHANCE

 JG CC1      CON1               0.98

    ROW3

    ROW4

 JL CC1      CON2               0.001

    ROW5

    ROW6

    ROW7

 

This example shows a mixed chance-constrained problem with two multidimensional probabilistic constraints. The first one requires a reliability level of at least 0.98 on ROW3 and ROW4, i.e.,

the second prescribes a fault tolerance of no more than 0.001 on ROW5, ROW6 and ROW7, i.e.,

assuming that all rows were originally defined as G-type rows in the core file.

 


ICC section for integrated chance constraints

 

This type of constraint was introduced by Klein Haneveld in 1986. Assume that a deterministic version of the problem features the constraint

 

 

Then an integrated chance constraint is of the form

 



where ~ is an arbitrary relation (³, £ or =).
 

The SMPS format defines an ICC section to set up such integrated chance constraints.

 

The header record contains the value ICC in the first name field. Each data record contains a code of L or G, an identifying name in the first name field, the name of a row in the second name field, and a numeric value for d in the first numeric field.

 

 

Example:

 

*23456789 123456789 123456789 123456789

ICC

 L  ICC1      ROW8        10.

 

Assuming that ROW8 was set up as an L-type row in the core file with right-hand side 100, this stoch file snippet specifies the integrated chance constraint

 


SCENARIOS section

 

This section describes an explicit event tree scenario by scenario. There is a unique node in the event tree that belongs to the first period. This node is sometimes called the root node. Scenarios are identified with the leaf nodes of the tree. One could think of the scenarios as paths all the way from the root node to the leaves, but it is easier to deal with them in this simplified manner: One scenario (the root scenario) represents a path from the root node to one of the leaves. All other scenarios branch from a parent scenario at some time prior to the final period. Information up to the branch period is shared between a scenario and its parent scenario, only after the branch occurred will there be duplicate information. This point of view is advantageous because it allows for a reduction of redundancy in the tree, which compresses the size of the stoch file.

 

To code the event tree information into the SCENARIOS section, we use two types of data records. The first, denoted by SC in the code field, signals the start of a new scenario. It gives the name of the scenario in the first name field and its path probability in the first numeric field. The name of the scenario from which it branches is written in the second name field, and the name of the period in which the branch occurred—i.e., the first period in which the two scenarios differ—in the third name field. The root scenario originates in stage one and has no parent scenario. This should be indicated by using the keyword 'ROOT' (including the quotes) in the second name field.

 

The next data records give the column/row values assumed by the scenario in the same format used in the corresponding section of the core file.

 

 

Example

 

*23456789 123456789 123456789 123456789 123456789

SCENARIOS

 SC SCEN_A    'ROOT'    0.3            PERIOD1

    COL2      ROW3      1.0

    COL3      ROW4      1.0

 UP BOUND1    COL5 1.0

 SC SCEN_B    SCEN_A    0.2            PERIOD3

    COL3      ROW4      2.0

 UP BOUND1    COL5      2.0

 SC SCEN_C    SCEN_A    0.1            PERIOD2

    COL2      ROW3      2.0

    COL3      ROW4      2.0

 UP BOUND1    COL5      2.0

 SC SCEN_D    SCEN_C    0.1            PERIOD3

    COL3      ROW4      1.0

 UP BOUND1    COL5      1.0

 SC SCEN_E    SCEN_D    0.1            PERIOD4

 UP BOUND1    COL5      2.0

 SC SCEN_F    SCEN_A    0.1            PERIOD2

    COL2      ROW3      3.0

    COL3      ROW4      3.0

 UP BOUND1    COL5      3.0

 SC SCEN_G    SCEN_F    0.1            PERIOD4

    COL4      ROW5      2.0

 

This is a description of the distribution of three entries: matrix entries in positions COL2/ROW3 and COL3/ROW4, and the upper bound on variable COL5, which we assume to occur in stages 2, 3 and 4, respectively. (The event tree is the tree of figure 1.)

Figure 1. An event tree with seven scenarios

 

A minor modification of the preceding format is needed for network parameters. Since arcs are defined by pairs of nodes, to say that the cost of the arc from node NODE1 to node NODE2 is random requires three parameters. The same goes for bounds and multipliers. In addition to the two name fields, we use the code field as in the following example. (The data record format used in the stoch file for the networks case is the same as that for the LP case.)

 

*23456789 123456789 123456789 123456789 123456789

 SC SCEN_1    'ROOT'    0.3            PERIOD_1

 C1 NODE1     NODE3     6.0

 U2 NODE6     NODE8     7.0

 SU NODE6               6.0

 

 

We use C1 for “cost of first arc from node NODE1 to node NODE2” and use higher numbers, such as C2, for the second parallel arc, etc. (The second character in the code field corresponds to the marker placed in the ARCS section in the core file.) Similarly, M = multiplier, U = upper bound, L = lower bound, X = upper and lower bound simultaneously.

 

Random supply and demand are accommodated by use of SU and DE, respectively, in the code fields as illustrated. The code SI can be used to provide values for a stochastic coefficient in a side constraint.

 

Any data items not specifically mentioned in the stoch file are inherited by each scenario from its parent scenario, which in turn may inherit them from its parent scenario, and so on. The base scenario inherits its data from the core file. Two things about this scheme of multiple inheritance are noteworthy. First, it is essential that all problems at a stage have the same problem dimensions and nonzero patterns. Second, the values entered for a particular scenario replace any values inherited from the ancestors.

 

To see why additive or multiplicative perturbations are impractical, consider scenario E in the decision tree of figure 1. Scenario E inherits its values from scenario D, which inherits its third-stage data from scenario C, which in turn inherits them from scenario A, and possibly from the core file. There could therefore be as many as five data items involved before the value of a particular coefficient in scenario E is settled, and the danger of forgetting or misspecifying an addition or multiplication is considerable.

 


NODES section

 

This section describes the construction of an event tree one node at a time, which is very convenient when the depth of the tree is not uniform (containing so-called trap states) or when the number of rows or columns associated with a particular period depends on the history of the data process. It is possible to treat such stochastic problem dimensions in a scenario-wise description by introducing a number of empty rows and columns, but these phantom elements (phantom rows, phantom columns, and phantom nodes) introduce irrelevant overhead into the problem formulation. The NODES section provides an alternate way to describe the event tree, without the use of phantom elements.

 

There are two types of data records. The start of each node is marked by a special code in the code field (see below). The first name field on this record supplies a name to the node and the second name field gives the name of the parent node (in the previous period). The first name field contains the conditional probability of reaching this node, given the parent node. The remaining data records for each node follow the core file layout.

 

The code field in the node identifier indicates how the information for the node is to be obtained and influences the appearance of the remaining rows. There are two possibilities: one could copy the information from one node to another, allowing for modification of data, although the nonzero structure must remain the same. This is signaled by the value CP in the code field. The alternative is to make up the information on the spot by reading an MPS file. The code MK is used for this purpose.

 

Under normal circumstance, the MK option is easier to handle, since it makes no assumptions about variable names of different nodes in the same time stage. Individual MPS files can thus be prepared for each node and strung together into one large file fairly easily. When the information is to be created on the spot, we insert an ordinary MPS file after the node identifier record. (We substitute an ENDNODE record to mark the end of the data for this particular node, and the NAME section is optional.) The user should be careful to note that the objective must have the same name in every node. Quadratic objectives are consistent with the MK option; however, the QSECTION must be incorporated into the stoch file.

 

On the other hand, if there are two or more nodes that do share problem dimensions and variable names, then it is possible to decrease the redundancy and shorten the file length by using the CP option. If information is to be copied, then the third name field on the node identifier record gives the reference node where the information is to be found. The third name field may contain the keyword 'CORFIL'. In this case the information is retrieved from the core file.

 

Following the node identifier record one lists the modifications to the constraint matrix, right-hand side, bounds, etc., in standard MPS format. There should not be any header records in this section. Any data element not explicitly mentioned is assumed to be identical to the corresponding data element in the reference node.

 

In this node-by-node description it is much harder to convey the dynamic structure of the problem. The user must therefore exercise discipline when placing the off-diagonal matrix coefficients. Such entries must appear in the block that defines the row, so that the columns will belong to nodes that have been processed before and can therefore be identified. This forces two further restrictions: Special ordered sets of Type 3 cannot have subdiagonal entries (unless they are specified explicitly as binary variables with explicit matrix coefficients), and network records are not allowed. This is because the originating and terminal nodes may belong to different time periods, creating ambiguities as to where to place the data record. (Of course the user is free to supply the network information on MPS style records.)

 

Every node in the event tree, even the deterministic ones, must be represented by at least the NODE header. The node corresponding to the first period has no predecessor; in this case the second name field should contain the keyword 'ROOT'.

 

Since the length of a scenario path and the problem dimensions are variable, the contents of the time and core files have to be adjusted as well.

 

The time file must be present, but it does not necessarily describe the full temporal structure. It must contain enough information to separate those nodes of the core file that are referenced explicitly. The core file must also be present. It lists the deterministic parts of the problem, which would normally include the first stage.

 

Here is an example to illustrate the above rules.

 

 

Time file:

 

*23456789 123456789 123456789 123456789 123456789

TIME          ProdDemo

PERIODS

    Prod0     Cost                     Period_0

    Prod1     Demd1                    Period_1

    Prod2     Demd2                    Period_2

ENDATA

 

 

Core file:

 

*23456789 123456789 123456789 123456789 123456789

NAME          ProdDemo

ROWS

 N  Cost

 E  Demd0

 E  Demd1

 E  Demd2

 L  Stor0

 L  Stor1

 L  Stor2

COLUMNS

    Prod0     Demd0        1.

    Hold0     Demd0       -1.          Demd1            1.

    Hold0     Cost         1.          Stor0            1.

    Prod1     Demd1        1.

    Hold1     Demd1       -1.          Demd2            1.

    Hold1     Cost         1.          Stor1            1.

    Prod2     Demd2        1.

    Hold2     Demd2       -1.

    Hold2     Cost         1.          Stor2            1.

RHS

    RHS       Demd0        1.

    RHS       Demd1        4.

    RHS       Demd2        4.

    RHS       Stor0        4.

    RHS       Stor1        4.

    RHS       Stor2        4.

BOUNDS

 UP BOUND     Prod0        4.

 UP BOUND     Prod1        4.

 UP BOUND     Prod2        4.

ENDATA

 

 

Stoch file:

 

*23456789 123456789 123456789 123456789 123456789

STOCH         ProdDemo

NODES

 CP NODE0     'ROOT'       1.0         'CORFIL'

 CP NODE1     NODE0        0.3         'CORFIL'

    RHS       Demd1        2.0

 CP NODE2     NODE0        0.5         'CORFIL'

 CP NODE3     NODE0        0.2         'CORFIL'

    RHS       DEMD1        6.0

 CP NODE4     NODE2        0.3         'CORFIL'

    RHS       Demd2        4.0

 CP NODE5     NODE2        0.4         'CORFIL'

    RHS       Demd2        5.0

 CP NODE6     NODE2        0.3         'CORFIL'

    RHS       Demd2        6.0

*-----------------------

 MK NODE7     NODE3        0.25

ROWS

 E  Demd2W

 E  Demd2S

 L  Stor2

COLUMNS

    Hold1     Demd2W       1.0

    Prod2W    Demd2W       1.0

    Prod2S    Demd2S       1.0

    Hold2W    Demd2W      -1.0

    Hold2W    Cost         1.0         Stor2           1.0

    Hold2S    Demd2S      -1.0

    Hold2S    Cost         1.0         Stor2           1.0

RHS

    RHS       Demd2W       6.0

    RHS       Demd2S       4.0

    RHS       Stor2        4.0

BOUNDS

 UP BOUND     Prod2W       8.0

 UP BOUND     Prod2S       8.0

ENDNODE

*-----------------------

 CP NODE8     NODE3        0.06        NODE7

    RHS       Demd2W       6.0

    RHS       Demd2S       6.0

    ....

ENDATA

 

This example describes a three-stage production problem. At the start only a single item is produced. If the demand in period 1 is low, production ceases entirely (the company goes out of business); if the demand in period 1 is medium, production continues; if demand in period 1 is high, a second item is introduced. The event tree with trap state is given in figure 2.

Figure 2. An event tree with a trap state

 


Penalties and moment constraints in NODES format

 

Variable names do not have to be shared between different nodes at the same stage. Hence it may be necessary to define penalties (SIMPLE, ROBUST, PLQ) as well as moment constraints (CHANCE and ICC) following the ROWS, COLUMNS and other sections of a node if the ‘MK’ option is used. Their order is arbitrary, and the format in each section is identical to the information given earlier in the respective sections. The ‘CP’ option provides inheritance of all data items from the reference node, including the penalty and moment sections. Any parameter in one of these constraints can be adjusted for the current node, if this should prove necessary.


INDEP section

 

This section describes the modelling of independent random variables. The SMPS format allows for three different forms of the distribution: discretely distributed, possessing a special distribution with no more than two parameters, or by accessing a user-defined subroutine. The header record distinguishes between these forms by placing appropriate keywords into the second name field (see table below). The third keyword may contain the value ADD, MULTIPLY or REPLACE to indicate how the value of the random variable is to modify the information found in the core file. The default is REPLACE.

 

The data records have the following structure:

 

first name field

column name of the random element

second name field

row name of the element

first numeric field

value

second numeric field

value

third name field

name of the period in which a particular realization is observed.

 

If the period can be inferred from the row/column name of the random element, the third name field may be left empty.

 

The interpretation of the values in the two numeric fields depends on the type of the distribution, as follows:

 

Keyword in second name field of the header record

Distribution

First numeric field

Second numeric field

DISCRETE

Discrete distribution

Value

Probability*

UNIFORM

Continuous uniform on [a,b]

a

b

NORMAL

Normal distribution 

mean m

variance s2

GAMMA

Gamma distribution (on [0,¥))

scale b

shape c

BETA

Beta distribution (on [0,1])

n

m

LOGNORM

Lognormal distribution

m = E(ln L)

s2 = Var(ln L)

(any other)

User-defined subroutine

(not used)

(not used)

 

*Possible values for each random element must be listed in sequence, and the probabilities must sum to 1.

 

The SMPS format provides for a mechanism by which the user can generate the realizations by accessing a user-defined subroutine. Any modifier on the header record different from the keywords for univariate or multivariate distributions, that is, other than DISCRETE, UNIFORM, NORMAL, GAMMA, BETA, LOGNORM, MVNORMAL, LINTRAN, ARMA, is taken to be the name of a subroutine, which must be provided by the user. The subroutine can be given calling parameters, and it must return one realization of the random variable it represents. Each parameter value is listed on a separate record following an indicator record containing only the code PR. Two types of values can be passed in this way, integers and reals, assumed to take up, respectively, four and eight bytes. The parameter type is inferred from the appearance of the number, but the user can override this convention by placing one of the two tags 'INTEGR' and 'DOUBLE' into the third name field; their meaning should be self-evident.

 

Parameters can also be passed by location, making it possible to access any problem coefficient or even current values of the decision variables in order to model any dependence structure required. Locations are identified by placing the column and row names into the first two name fields. Current decision variables are identified by placing the tag 'PRIMAL' into the first name field, current dual variables use the tag 'DUAL'.

 

 

Example 1

 

*23456789 123456789 123456789 123456789 123456789 123456789

INDEP         DISCRETE

    COL1      ROW8      6.0            PERIOD2     0.5

    COL1      ROW8      8.0            PERIOD2     0.5

    RHS       ROW8      1.0            PERIOD2     0.1

    RHS       ROW8      2.0            PERIOD2     0.5

    RHS       ROW8      3.0            PERIOD2     0.4

 

 

In this example the entry COL1/ROW8 takes value 6.0 with probability 0.5 and 8.0 with probability 0.5; and the right-hand side of ROW8 takes values {1.0, 2.0, 3.0} with probabilities {0.1,0.5,0.4} respectively. The example shows random matrix coefficients only, other data items (including network data) can be rendered in the manner described in the SCENARIOS section.

 

 

Example 2

 

*23456789 123456789 123456789 123456789 123456789 123456789

INDEP         DISCRETE                 ADD

    COL1      ROW8      -1.0           PERIOD2     0.5

    COL1      ROW8       1.0           PERIOD2     0.5

 

Assuming that the core file contains a value of 7.0 for the coefficient  COL1/ROW8, the entry COL1/ROW8 will again take value 6.0 with probability 0.5 and 8.0 with probability 0.5.

 

 

Example 3

 

*23456789 123456789 123456789 123456789 123456789 123456789

INDEP         UNIFORM

    COL1      ROW8       8.0           PERIOD2     9.0

INDEP         NORMAL

    RHS       ROW9      10.0           PERIOD2     4.0

 

In this example the random entry COL1/ROW8 is uniformly distributed over the interval [8.0,9.0], and the right-hand side in ROW9 is normally distributed with mean 10.0 and variance 4.0.

 

 

Example 4

 

*23456789 123456789 123456789 123456789 123456789 123456789

INDEP         TRIANG 

    COL5      ROW8                     PERIOD2

 PR

                       10.0

                       20.0

    COL1      ROW1

 

This example shows how to set up a non-symmetric triangular distribution defined over an interval. This requires the specification of the mode along with the two endpoints of the interval. The mode is the third parameter, which is assumed equal to the (possibly random) realization of the coefficient in location COL1, ROW1. (It is left to the user to check that its value does indeed fall between the two endpoints 10 and 20.) When this example is distributed, the user should make sure to include the subroutine TRIANG(min,max,mode).

 


BLOCKS section

 

This section shows how to set up multidimensional random vectors that may have correlated components but are independent of other random elements in the problem.

 

The header record is similar to the INDEP header: The second name field specifies the type of the block (DISCRETE, MVNORMAL, LINTRAN, ARMA or user defined subroutine), the third name field describes the way the random information interacts with the core file data (ADD, MULTIPLY or REPLACE—the default is REPLACE).

 

DISCRETE blocks use two types of data records. The first record of each realization is distinguished by the code BL; the first name field gives a name to the block, and the second name field pinpoints the period in which the realizations become known; finally, the first numeric field contains the probability with which this particular set of values is attained. Following this marker, the values are listed on separate data records. The first name field gives the column name, the second name field gives the row name, and the first numeric field gives the value itself. The other fields are not used.

 

 

Example

 

*23456789 123456789 123456789 123456789

BLOCKS        DISCRETE

 BL BLOCK1    PERIOD2    0.5

    COL1      ROW6      83.0

    COL2      ROW8       1.2

 BL BLOCK1    PERIOD2    0.2

    COL2      ROW8       1.3

 BL BLOCK1    PERIOD2    0.2

    COL1      ROW6      84.0

 BL BLOCK1    PERIOD2    0.1

    COL1      ROW6      84.0

    COL2      ROW8       0.0

 

 

To save space, only values that change must be recorded. The SMPS format adopts the convention that the first statement of the block is the basis from which all changes are computed. (This implies that zero values in subsequent realizations must be stated explicitly, as shown in the last data record of the example above.)

 

In this example the block, called BLOCK1, is the 2-vector made up of the entries COL1/ROW6 and COL2/ROW8. It takes values (83.0,1.2) with probability 0.5, (83.0,1.3) with probability 0.2, (84.0,1.2) with  probability 0.2, and (84.0,0.0) with probability 0.1.

 

It is possible to model additive or multiplicative perturbations, but the user must be careful since each block modifies the core file values and at the same time replaces values in the first occurrence of the block. If the core file records values of 83.0 and 1.2 for the locations COL1/ROW6 and COL2/ROW8, respectively, then the following specification is equivalent to the previous example:

 

*23456789 123456789 123456789 123456789 123456789

BLOCKS        DISCRETE                  ADD

 BL BLOCK1    PERIOD2    0.5

    COL1      ROW6       0.0

    COL2      ROW8       0.0

 BL BLOCK1    PERIOD2    0.2

    COL2      ROW8       0.1

 BL BLOCK1    PERIOD2    0.2

    COL1      ROW6       1.0

 BL BLOCK1    PERIOD2    0.1

    COL1      ROW6       1.0

    COL2      ROW8      -1.2

 


Multivariate normal distributions can be set up using the MVNORMAL keyword on the header record. The first data record contains a BL in the code field and has the same appearance as the first data record for DISCRETE blocks. Following this is a list of locations that make up this block. Each location is identified by two names in the first two name fields, which leaves two numeric fields and a third name field free. We place the mean value of each marginal distribution in the first numeric field and the variance in the second. An alias is placed in the third name field so that future occurrences of this random variable can be uniquely deduced from a single name field.

 

It remains to specify the correlations between pairs of random variables. This is effected in a separate subsection whose start is marked with the code CR in the code field. On each data record, the first two name fields list the aliases of two locations and the first numeric field contains the correlation between the two random variables.

 

Alternatively the user may want to specify the covariances between the components of the random vector. This is easily set up using the code CV.

 

 

Example:

 

BLOCKS        MVNORMAL

 BL BLOCK1    PERIOD2

    COL1      ROW6       5.0           ALIAS1      1.0

    COL2      ROW8       8.0           ALIAS2      2.0

    COL3      ROW9      11.0           ALIAS3      3.0

 CR

    ALIAS1    ALIAS2    0.6

    ALIAS1    ALIAS3    0.6

    ALIAS2    ALIAS3    0.6

 

This example sets up a trivariate normal distribution of the three locations COL1/ROW6, COL2/ROW8, COL3/ROW9, with mean vector (5,8,11), and variances 1, 2, and 3, respectively. The three components are equicorrelated with correlation coefficient 0.6.

 


Random vectors can also be generated by a user-defined subroutine. This is indicated by a modifier value that is not part of the existing standard, that is, not equal to DISCRETE, UNIFORM, NORMAL, GAMMA, BETA, LOGNORM, MVNORMAL, LINTRAN, ARMA, similar to the INDEP section.

 

The BL record once again marks the beginning of the block, giving a name and fixing the stage for the block. On subsequent data records the locations making up the random vector are listed in the first two name fields. Parameters that must be passed to the subroutine can be specified following a record containing the code PR, with all the conventions and possibilities discussed in the INDEP section.

 

 

Example

 

BLOCKS        USER1

 BL BLOCK1    PERIOD2

    COL1      ROW6

    COL2      ROW8

    COL3      ROW9

 PR

                            12.0

                             3.0

    RHS       ROW1

    'PRIMAL'  COL5

 

 

The user-defined subroutine USER1 returns values for the locations COL1/ROW6, COL2/ROW8, COL3/ROW9. This routine takes four parameters, whose values are given on the records following the marker PR. The third parameter is the value of the right-hand side of ROW1, the fourth parameter is the current value of the decision variable COL5.

 


Sometimes the user may want to specify stochastic coefficients determined by underlying random phenomena (factors) of lower dimension. A simple form of this relationship is an affine transformation, which can be written mathematically as v = H u + c, where H and c are deterministic and u is a random vector. Even though this device may blur the boundary between modelling and recording of an instance in ways not intended by the original MPS format, it is nonetheless useful because in many cases the dimension of the vector u is much smaller than that of the vector v. Certain approximation algorithms may be able to take advantage of working in the original space, so that there is considerable merit in associating the original random factors u with the problem instance.

 

The use of this format is signalled by the keyword LINTR on the BLOCKS header record. Once more the start of each block is marked with a record containing the code BL in the code field, followed by a name for the block in the first name field. The second name field identifies the period in which the information becomes known. Following this header record, the locations of the elements of the random vector v are identified on separate data records using the first two name fields; the additive constant (component of the vector c) can be entered in the first numeric field.

 

Then the coefficients of the transformation matrix H are given column by column. Each column is tied to one component of the vector u, and there are two mechanisms for supplying its distribution. Any components of u that is independent of the other components can be specified directly on a record with the code RV. The identifier of the distribution (DISCRETE, UNIFORM, etc.) is placed in the second name field, the distribution parameters are placed in the two numeric fields. The first name field is not used. Alternatively, the name of a random variable defined in a preceding DISTRIB section may be placed into the first name field.

 

Which of these two mechanisms is used depends on the appearance of the RV data record. If the second name field is empty, the random variable named in the first name field must have been defined in a preceding DISTRIB section; if the second name field is not empty, its value must describe a valid two-parameter distribution family, which is then taken to describe the distribution of the current random variable. In this case the first name field is ignored (but must be present for correct parsing of free-format input).

 

This generic reference to a distribution is normally sufficient, since appropriate copies can be made of the random variable whenever it is referenced. However, if a random variable is lagged (e.g., to be used in subsequent ARMA sections), a specific name (that is, a unique identifier) may be given in the third name field.

Correct parsing of free-format input mandates a placeholder 'LAGGED' to be put in the second name field.

 

We will illustrate these rules with the following example:

 

*23456789 123456789 123456789 123456789 123456789 123456789

DISTRIB       MVNORMAL

 BL BL1

    U1                    0.0                      1.0

    U2                    0.0                      1.0

 CV

    U1        U2          0.5

BLOCKS        LINTR

 BL BLOCK1    PERIOD1

    COL1      ROW1        5.0

    COL2      ROW2       10.0

    COL3      ROW3        0.0

    COL4      ROW4        0.0

*23412345678  12345678  123456789012   12345678  123456789012

 RV U1        'LAGGED'                 U11

    COL1      ROW1       -1.0

    COL2      ROW2        2.0

    COL3      ROW3        1.0

 RV U2        'LAGGED'                 U21

    COL1      ROW1        1.0

    COL2      ROW2        2.0

    COL3      ROW3        3.0

    COL4      ROW4        4.0

 RV U3        NORMAL     10.0          U31         3.0

    COL1      ROW1        1.0

 

This example defines a joint probability distribution for the four random elements COL1/ROW1, COL2/ROW2, COL3/ROW3, COL4/ROW4 making up the random vector v, which depends on a three-dimensional random vector u as follows: The first two components of u are defined in a DISTRIB section as bivariate normal with mean (0,0)' and covariance matrix . The third component is defined directly in the block itself and is normal with mean 10 and variance 3. It is independent of the other components, so that u will have a trivariate normal distribution with mean (0,0,10) T and covariance matrix . The affine transformation is given by

 

By the rules governing linear transformations of normal random variables, v=Hu+c is therefore also normally distributed with mean vector mv = H mu + c = (15, 10, 0, 0)T and covariance matrix .

 


More general linear dependencies can be established as well. Processes of the form

 

,

 

where the random variables uj are independent from stage to stage and Mi and Nj are fixed matrices of appropriate dimensions, are called ARMA(m,n) processes and can be defined using the following conventions:

 

  • Columns of the matrix N0 are given exactly as in the LINTRAN section, that is, they are introduced by a record with the code ‘RV’, with nonzero entries in this column listed on subsequent records.

 

  • Columns of the matrices Ni, i > 0, are introduced by a record with the code ‘LV’. In order to make the association, this requires a specific reference, which must have been established previously when the random variable was first used.

 

  • Columns of the matrices Mi are introduced by a record with the code ‘HV’. Since this references stochastic elements, two name fields are needed to make the association.

 

The following example may serve to illustrate these rules. It is a continuation of the previous example.

 

BLOCKS

 BL BLOCK2    STAGE2

    COL5      ROW5           0.0

    COL6      ROW6           0.0

 RV U1        'LAGGED'                 U12

    COL5      ROW5           5.0

 RV U2        'LAGGED'                 U22

    COL6      ROW6           3.0

 RV U3        'LAGGED'                 U32

    COL5      ROW5           1.0

    COL6      ROW6           2.0

 HV COL3      ROW3

    COL5      ROW5           1.0

 HV COL4      ROW4

    COL6      ROW6           1.0

 LV U31

    COL5      ROW5           1.0

 

This example sets up the ARMA(1,1) process yt = M1yt-1 + N0ut + N1ut-1 with

 

.

 


DISTRIB section

 

This section describes the use of random variables (univariate and multivariate) that do not correspond directly to stochastic elements within the current problem. This device is needed to adequately model certain multivariate distributions, since the MPS data record does not allow sufficient fields to specify correlations between random elements. The main purpose of a DISTRIB section is to provide an alias (a single name) by which the random variable can be referenced in a later BLOCKS section.

 

There may be many DISTRIB sections; it is assumed that each of them defines random variables independently of the others. Conversely, each new group of random variables must be introduced by a new DISTRIB header.

 

The rules for the DISTRIB sections are similar to those for other distributions (INDEP, BLOCKS) except that only one name field (the first) is used to provide the name for the random variable. Discrete random variables can be defined, as can two-parameter families and the multivariate normal distribution. In the latter case, distinct names must be used for the distribution and for each of its components. The user can even specify univariate or multivariate random variables by accessing a custom-made subroutine.

 

Each entry thus defined establishes a generic distribution which can be referenced more than once in subsequent BLOCKS.

 

Example

 

*23456789 123456789 123456789 123456789 123456789 123456789

DISTRIB       UNIFORM

    U1                  0.0                      1.0

DISTRIB       MVNORMAL

 BL BL1

    U2                  0.0                      1.0

    U3                  0.0                      1.0

 CV

    U2        U3        0.5

 

This example defines the auxiliary random variable U1, which is uniformly distributed on [0,1], and a bivariate normal random vector whose components have mean 0, variance 1 and covariance 0.5. The vector can be referred to as BL1, its first component as U2 and its second component as U3.

 


Linking constraints

 

Most constraints in stochastic programming are of the “replicating” kind. This means that each constraint contains variables from at most one node in each period. The constraint is thought to belong to the latest period in which it has variables and is replicated for each realization of the random elements.

 

In some special instances (for example, to model risk constraints in financial applications) it may be useful to link together variables belonging to different nodes in the same period. The easiest way to include such linking or global constraints in a stochastic program is to assign the constraint to the first period. (This is best done using the explicit form of the time file.) While the more common decomposition solvers will not be able to deal with linking constraints, specially developed software (see e.g., Steinbach 2003) is available.

 

The following is an example of a linking constraint:

 

Core file entry:

 

*23456789 123456789 123456789 123456789

ROWS

 G  GLOBCON

    ...

COLS

    ...

    FWEALTH   GLOBCON  1.20

    ...

RHS

    RHS1      GLOBCON   100.

 

Time file entry:

 

*23456789 123456789 123456789 123456789 123456789

PERIODS       EXPLICIT

                                       Period_1

                                       Period_2

                                        

ROWS

    ...

    GLOBCON   Period_1

COLS

    ...

    FWEALTH   Period_2

 

 

Stoch file entry:

 

*23456789 123456789 123456789 123456789 123456789

SCENARIOS

 SC Scen_1    'ROOT'    0.4            Period_1

 SC Scen_2    Scen_1    0.6            Period_2

    FWEALTH   GLOBCON  1.30

    ...

 

 

This construction sets up the global constraint

 

1.20*FWEALTH(Scen_1) + 1.30*FWEALTH(Scen_2) ³100.

 


Order of processing

 

To make sure that the many variations can all be parsed correctly it is important to read the three files in the correct order, as follows:

 

Time file: PERIODS section only

Core file

Time file: Remaining sections (if any)

Stoch file

 


Solver capabilities and error recovery

 

The SMPS format is extremely flexible and can define a wide variety of problems, far exceeding the capability of any existing stochastic programming solver. It should not be assumed, therefore, that a given problem can be solved with a particular solver nor that in fact there is any solver capable of solving the problem. However, any SMPS parser should recover gracefully when it encounters a section or construct its solver is not prepared to deal with.

 


References:

 

[1]. J.R. Birge, M.A.H. Dempster, H.I. Gassmann, E.A. Gunn, A.J. King and S.W. Wallace, “A standard input format for multiperiod stochastic linear programs”, COAL Newsletter #17 (1987) pp. 1–19.

 

[2]. J.R. Birge and F. Louveaux, Introduction to Stochastic Programming, Springer Series in Operations Research, Springer Verlag, New York, 1997.

 

[3]. A. Charnes and W.W. Cooper, “Chance-constrained programming”, Management Science 5 (1959) 73–79.

 

[4]. J. Edwards, J. Birge, A. King and L. Nazareth, “A standard input format for computer codes which solve stochastic programs with recourse and a library of utilities to simplify its use”, Working Paper WP-85-03, International Institute for Applied Systems Analysis, Laxenburg, Austria, 1985.

 

[5]. H.I. Gassmann and E. Schweitzer, “A comprehensive input format for stochastic linear programs”, Annals of Operations Research 104 (2001) 89-125.

 

[6]. International Business Machines, Inc., "Passing your model to OSL using Mathematical Programming System (MPS) format", world-wide web document, http://www-306.ibm.com/software/data/bi/osl/pubs/Library/featur11.htm.

 

[7]. P. Kall and S.W. Wallace, Stochastic Programming, John Wiley and Sons, New York, 1994.

 

[8]. A.J. King, “An implementation of the Lagrangian finite-generation method”, in: Yu. Ermoliev and R.J-B Wets (eds.), Numerical Techniques for Stochastic Optimization, Springer Series in Computational Mathematics, Vol. 10, Springer Verlag, Berlin, 1988.

 

[9]. W.K. Klein Haneveld, Duality in Stochastic Linear and Dynamic Programming, Lecture Notes in Economics and Mathematical Systems, Vol. 274, Springer-Verlag, Berlin, 1986.

 

[10]. D. Klingman, A. Napier and J. Stutz, “NETGEN: A program for generating large scale capacitated assignment, transportation, and minimum cost flow network problems”, Management Science 20 (1974) 814–821.

 

[11]. J.M. Mulvey, R.J. Vanderbei and S.A. Zenios, “Robust optimization of large-scale systems”, Operations Research 43 (1993) 264–281.

 

[12]. A. Prékopa, Stochastic Programming, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1995.

 

[13]. M.C. Steinbach, "Tree-sparse convex programs", Mathematical Methods of Operations Research 56 (2003) 347-376.

 


Last modified 11-Jul-2006 .

 

Please send comments and errata to the author.