14 Main kinds of Optimization Modeling Methods
- Decision analysis — This community generally works with discrete actions, possibly discrete random outcomes, but often features complex utility functions and the handling of risk. Problems are relatively small.
• Stochastic search (derivative based) — This field is centered on the basic problem minx EF(x, W) where x is continuous (scalar or vector), W is a random variable,3 and where the expectation typically cannot be computed. However, we assume we can compute gradients ∇xF(x, W) for a known W.
• Ranking and selection (derivative free) — This field is also focused on minx EF(x, W), but now we assume that x can take on one of a finite set of outcomes {x1, x2, . . . , xM}.
• Simulation-optimization — This community evolved from within the setting of discrete event simulations, where we need to use a simulator (such as one of a manufacturing system) to compare different designs. The field of simulation optimization started with the ranking and selection problem, but has evolved to span a wider range of problems.
• Online computation — This field describes methods where decisions are made which simply react to information as it comes in, without considering the impact of decisions now on the future. This field was originally motivated by mobile applications where energy and computing capacity was limited.
• Optimal control — The roots of this community are in engineering, focusing on the control of aircraft, spacecraft, and robots, but has expanded to economics and computer science. The original problems were written in continuous time with continuous controls, but is often written in discrete time (typically with discrete controls), since this is how computation is done. Problems are typically deterministic, possibly with uncertain parameters, and possibly with additive noise in the transition function, but this community has been widely adopted, especially in finance where problems are purely stochastic.
• Robust optimization — This is an extension of stochastic search with roots in engineering, where the goal is to find the best design x (of a building, an aircraft, a car) that works under the worst instance of a random variable W (which could be wind, temperature, crash conditions). Instead of minx EF(x, W), robust optimization problems seek to solve minx maxw F(x, w). For example, we might want to design a wing to handle the worst possible wind stresses.
• Optimal stopping — This is an important problem in finance, where we have to study when to stop and sell (or buy) an asset. It also arises in engineering when we have to decide when to stop and repair a piece of machinery. The problem is to find a time τ to sell or repair, where τ can only depend on the information available at that time. The problem is popular within the applied probability community.
• Markov decision processes — This community evolved primarily within applied probability, and describes a system that takes on a discrete state s, and transitions to s 0 when we take (discrete) action a with (known) probability p(s 0 |s, a). At the heart of Markov decision processes involving calculating the value V (s) of being in a (discrete) state s ∈ S. The problem is that when s is a vector, the number of possible states |S| grows exponentially, a behavior widely known as “the curse of dimensionality.”
• Approximate/adaptive dynamic programming — Several communities have developed ways of overcoming the “curse of dimensionality” inherent in the tools of discrete Markov decision processes by using simulation-based methods to develop approximations V (s) of the value of being in state s.
• Reinforcement learning — This field started by modeling animal behavior seeking to solve a problem (such as finding a path through a maze), where experiences (in the form of successes or failures) were captured by estimating the value of a state-action pair, given by Q(s, a) using a method known as Q-learning. In the 1990’s, reinforcement learning was a different form of approximate dynamic programming, but this has evolved as researchers found that Q-learning did not always work (the same could be said of approximate dynamic programming). As of this writing, “reinforcement learning” has grown to represent the broad problem class of sequential decision problems, which can be solved using any of the solution approaches that are presented in this book. In fact, some would describe this entire book as “reinforcement learning.”
• Stochastic programming — This community evolved from math programming with the desire to insert random variables into linear programs. The classical problem is the two-stage problem where you pick x0 (e.g. how many Christmas trees to plant), after which we learn random information W1, and then we make a second decision x1 (e.g. shipping Christmas trees to customers).
• Sequential kriging — This community evolved within geosciences, where we need to learn the largest value of a continuous function f(x) through expensive experiments (originally field experiments). The vector x was originally a two-dimensional point in space, where f(x) might be the amount of oil or natural gas yield by drilling a hole at x.
- Multiarmed bandit problems — The roots of this problem come from applied probability, where the goal is to identify a discrete alternative (known as an “arm” in this community) that yields the highest reward, where we do not know the value of each “arm.” We learn the value through repeated experimentation, accumulating rewards as we progress.
Source
- Reinforcement-Learning-and-Stochastic-Optimization by Powell