This Control Talk column appeared in the August 2020 print edition of Control. To read more Control Talk columns click here or read the Control Talk blog here.
Greg: It is more important than ever with changing marketplaces and increased competition that processes achieve the greatest capacity, efficiency and flexibility. We are fortunate to have a leading global expert on optimization Dr. Russell Rhinehart, emeritus professor at Oklahoma State University School of Chemical Engineering, give us an extensive fundamental understanding and perspective of the opportunities and technologies. I primarily have focused on selectively pushing constraints via PID Override and Valve Position Control strategies to maximize production rate and flexibility and minimize energy, reagent and reactant usage as noted in the Control article “Don’t Over Look PID in APC.”
Greg: What is Optimization?
Russ: Optimization is the method of adjusting decisions to get a best outcome. The decision could be when, and what, to buy/sell to maximize portfolio value. It could be how to approach your boss about scheduling vacation time to fit your family’s. In those cases, you have an intuitive model of how things will go, and you take action based on that understanding. But, you might have a mathematical model, such as how a heat exchanger works, and choose tube diameter to minimize a combination of capital cost and annual expenses. Alternately, you might be using empirical data to guide the action. For instance, increasing complexity of a control system (gain scheduling, ratio, cascade, feedforward, override, etc.) improves control, which you might measure by a reduction in process variability after observing the process for a week.
There is always a balance of opposing ideals. And optimization seeks the best balance. For example, as you add features to a control system design, the desirable aspect is that process control improves, but the undesirable aspect is that the system complication becomes a maintenance burden. Similarly, changing setpoints to either minimize process energy consumption or maximize production throughput may be desirable ideals; but doing so may cause quality deviations, an undesirable outcome.
As terminology: Decision Variables (DV) are what you can adjust. These could be classes (such as the type of equipment, or treatment, or control system features), or numerical values (such as setpoints, duration, temperature, feed composition, etc.).
The Objective Function (OF) provides the value of what you are seeking to maximize or minimize. It must include all of the desirable and undesirable aspects and place all concerns on an equivalent basis.
Much within human decisions is related to improving our situation by making best choices. Within the process industry this includes equipment choice and design, procedures and scheduling. But also, much is within automation and control.
Greg: You’ve mentioned control system design and choosing setpoints. Where else is optimization relevant to process control?
Russ: Many places. We use dynamic models such as first order plus dead time (FOPDT) for representing process behavior for tuning controllers, and structuring decouplers and feed forward control. Classically, the reaction curve method was a best way to manually fit models to data, but with computers, regression adjusts model gain, time-constant and delay to best fit noisy data. Advanced process control models often are based on a dynamic matrix model or second order plus dead time (SOPDT) approximations, and optimization (regression) best fits the models to the data.
Another optimization application would be determining setpoints to minimize costs or maximize throughput. This would be supervisory Real Time Optimization (RTO) if we have process models, or Evolutionary Operation (EVOP) if we are using data from process performance to guide the changes.
We tune controllers to best balance disturbance rejection against making undesirably large changes in utilities. Also, we tune to have desirable control now, but also for good control later when the process gain and time-constants change; so, we may have to accept sub-optimal sluggish control now to prevent oscillating control later.
In Model Predictive Control (MPC), the optimizer calculates a future sequence of control actions that best make the process model follow the desired trajectory to the setpoint, while avoiding constraints and minimizing cost. The controller implements the first action in the sequence. The sequence repeats at each sampling.
In batch operations we might want to maximize annual production, and each batch might asymptotically approach complete yield. It may be better to stop a batch after 85% yield and have 100 batches per year (100*0.85=85 units), rather than to wait for 99% yield and only get 50 per year (50*0.99=49.5 units).
Other control applications include data reconciliation and scheduling.
Greg: We can make a batch process more like a continuous by fed-batch control of the slope of the batch temperature or composition or pH profile. The batch profile slope gives a pseudo steady state. Would this enable us to use some of these techniques you describe?
Russ: Yes, certainly. Recipe variables in batch productions include the temperature and pressure schedule, the timing of additives, mixing rates, etc., can all be optimized to optimize batch performance metrics. And often, end-of-batch analysis feeds back adjustments to subsequent recipes. Here, statistical process control techniques can temper the impact of natural variation on the recipe changes, and reduce variation.
Greg: How is optimization performed?
Russ: Often it is done by heuristic human guidance, but also it can be automated by computer code. The computer could be following mathematical rules, or it could follow a set of human expert rules. Optimization is an iterative, trial-and-error, incremental DV adjustment which progressively improves the OF. It does not magically know what the best value is, but has to search around to find it.
Before computers and calculus, optimization was exclusively a heuristic, human directed, trial-and-error procedure. About 300 years ago, calculus generated Newton-type and successive quadratic procedures which use surrogate quadratic models to guide the search. In the past 100 years there has been an explosion of more complex mathematical techniques, which are excellent for applications with specific attributes, and which have become the fashion among the sophisticated. Computers have permitted greater complexity in the mathematically based methods, but they also permit a return to heuristic direct search algorithms, which served humans for hundreds of thousands of years prior. I believe the heuristic approaches have greater potential.
Greg: You give it away?
Russ: Often, yes. In my academic career, in teaching classes, I had to understand techniques, select what was most appropriate for the audience to understand, then I gave away the secrets in lectures and materials. Now that I’ve retired, I continue to do so, and have a website, www.r3eda.com. Readers can visit it to see details about optimization and access Excel-VBA software to explore many techniques.
Greg: Can you describe optimization, without using any equations?
Russ: Sure! But, don’t tell my family. I want to preserve my engineering persona with them. But actually, if a CONTROL reader told my family that I use plain talk, that would actually enhance my identity.
A Trial Solution (TS) is a guess of the best DV values, the variables that are being adjusted. Some algorithms have a single TS and progressively move it to better spots. Other algorithms are multiplayer types with many simultaneous trial solutions.
To understand single trial solution algorithms, consider that you are blindfolded, standing on a hilly surface and want to walk to the point of lowest elevation. (If you could see where the minimum was, you would just jump to that spot, and would not need an optimizer. If you need an optimizer, you don’t know where the minimum is. Neither does the optimizer.) So, blindfolded, you feel the local slope with your feet and take steps in the steepest down-hill direction. When steps in any direction move you up-hill or when the slope of the ground is zero, you have found the optimum (converged).
There are many ways to analyze the local surface, and then many rules to use to define the sequence of steps. So, there are hundreds of single TS algorithms.
This approach often works. However, it can be blocked by surface features. Probably, there are many local hollows on the surface; and if you are in the bottom of one, you’ll think you’ve found the global minimum. There are also cliff-walls or fences on your down-hill path (constraints). If the true minimum is on the other side, you cannot cross through, and moving sideways along the constraint is not moving down-hill, which becomes another local trap.
It does not matter whether you are seeking the maximum or minimum, the issues are the same.
The solution, the point at which convergence is claimed, can be very dependent on the initial TS value.
Greg: And how do multiplayer optimizers work?
Russ: Consider that you, along with a bunch of friends are randomly placed on the hilly surface, all blindfolded, and the team wants to find the point of lowest elevation. Each person gets a location and altitude value, and they can communicate that information with everyone else. So, everyone knows who is in the worst and best positions. The worst player leaps over the best, lands in a random spot on the other side of the best, and gets a new location and altitude reading. This repeats until all players are within convergence distance. Again, there are many local hollows on the surface; but even if you are in the bottom of one hollow, if another player is better, you’ll leap out of your local trap. Again, there are also cliffs or constraints, and if a leap places you in an infeasible spot, you remain the worst and leap again. The diverse placement of many players, along with leaps into unexplored regions improves the probability of finding the global minimum.
Multiplayer algorithms are recent, and first published only about 30 years ago as “genetic algorithms” and “particle swarm optimization.” I described “leapfrogging” above. The movement of the players can use any number of rules. Many algorithms seek to mimic the “intelligence” in nature, such as how birds, ants or gnats find best places. Or even, how genes evolve.
Greg: Which are best?
Russ: To me, the newer, rule-driven multiplayer optimization algorithms are more robust and more generally applicable than the mathematically sophisticated programming techniques. But, for specific applications, the mathematical techniques are also excellent.
But what is best? If you understand one approach, and your stakeholders accept it, and it works on your application, then that is probably the best. Determine all the context desirables and undesirables, then choose the optimizer that best balances all of the concerns. The criteria that an expert uses to decide which is best may not match your stakeholders’ viewpoint.
Greg: What are attributes of applications that cause difficulty for optimizers? You mentioned multiple optima and constraints. Others?
Russ: There are many application difficulties, which have led to a large number of optimization techniques. These troubles include:
-
Non-quadratic behavior,
-
Multiple optima,
-
Stochastic (noisy) responses,
-
Asymptotic approach to optima at infinity,
-
Hard inequality constraints, or infeasible regions,
-
Slope discontinuities (sharp valleys),
-
A gently sagging channel (effectively slope discontinuities),
-
Level discontinuities (cliffs),
-
Striations,
-
Flat spots, or nearly flat spots,
-
Planar regions or nearly planar
-
Very thin global optimum in a large surface, pin-hole optima, improbable to find,
-
Discrete, integer, or class DVs mixed with continuous variables,
-
Underspecified problems with infinite number of equal solutions, and
-
Discontinuous response to seemingly continuous DVs because of discretization in a numerical integration.
Greg: What are the major algorithms?
Russ: Everyone’s experience will be different. Some algorithms are well established in some communities, and I might not mention a technique that happens to be a reader’s favorite. But here is my view of important ones.
I think that a key optimization technique is Linear Programming (LP), from about 1948. It is common in business planning, resource allocation and scheduling. In LP, the objective function is a linear response to the DVs, and the feasible area is bounded by linear constraint relations on the DVs. With this situation, the solution is at an intersection of constraints on the boundary. LP starts at one intersection of constraints, then rapidly moves along constraints, intersection-to-intersection, to the optimum.
Levenberg-Marquardt (LM) (1955, rediscovered in 1963) is a favorite of mine representing the single TS mathematical procedures. It blends two techniques: incremental steepest descent and Newton-Raphson. Newton-type techniques are fast to converge in the vicinity of an optimum, but they would go to either a min or a max and can jump to extremes. The incremental steepest descent takes smallish down-hill steps and guarantees to find a minimum. LM is an excellent blend for nonlinear optimization. But, it will seek a local optimum, and cannot handle constraints.
Generalized Reduced Gradient (GRG) (1974, and now an Excel Solver option) is designed to handle inequality constraints. Basically, it linearizes the constraints, uses slack variables to convert inequality to equality constraints, and then uses a sequential line search approach. It is, however, a single TS approach and will find local optima, and can jump into infeasible places. The solution is often dependent on the initial TS value.
Both LM and GRG are gradient-based optimizers that use the local slope to know where to go next. By contrast, direct search optimizers only use the OF value. They can use human heuristic rules. One of my favorites is Hooke-Jeeves (HJ), first published in 1961. It does partial local exploration then moves the central point in the best direction. And repeats. The Nelder-Mead (1965) improvement of the Spendley-Hext-Himsworth (1962) simplex search uses an alternate direct search logic. In my opinion, these are more robust to many surface difficulties (like planar regions) than the mathematical approaches, and are often faster to converge. However, they are still single trial solution approaches that can get stuck in local optima and do not handle constraints very well.
I prefer the multi-player approaches, and my favorite is leapfrogging (LF) (2012). Randomly scatter players (trial solutions) throughout the feasible DV space, and leap the worst over the best into a random spot. LF is fast, has a high probability of finding the global, and it can cope with constraints and most of the difficulties mentioned above.
There are many more algorithms, and many variations on each one to improve speed or robustness. See a book for details.
Greg: You wrote a book?
Russ: Yes, it is based on my 13-years industrial experience as to what engineers need to know. It is an outcome of the most popular elective engineering course at OSU. Rhinehart, R. R., Engineering Optimization: Applications, Methods, and Analysis, 2018, John Wiley & Sons, New York, NY, ISBN-13:978-1118936337, ISBN-10:1118936337, 731 pages with companion web site www.r3eda.com.
Greg: Where can a reader obtain these algorithms?
Russ: Many data-processing and simulation software vendors include several optimizer choices. This includes Excel. And some algorithms are imbedded in advanced control and automation products.
Greg: If vendors provide the algorithm and code what do users have to do?
Russ: I think a critical item is to ensure that all context issues are included. The user needs to understand how the organization interprets desirables and undesirables. These include risk. Some undesirables are hard constraints that cannot be violated, but many constraints can be converted to a penalty for time or extent of violation. The user needs to define how to evaluate all of these aspects in a single metric (perhaps $ per year or equivalent concerns), often needing economic values and models representing the process. All of this is more important than the optimization algorithm.
Greg: Do users need to do anything else?
Russ: Yes!
-
Choose the initial Trial Solution.
-
Choose convergence criteria and threshold.
-
Temper the solution with respect to uncertainty in the models and coefficient values.
-
Understand how the algorithm works, so that they can do the above.
Greg: What is new or under-utilized?
Russ: Hooke-Jeeves, just mentioned is a robust fast direct search. I think it was over-shadowed by the sophistication of gradient based methods, but in my trials, it out-performs them.
Memetic, multiplayer, “global” algorithms such as particle swarm and leapfrogging. These use rules that move players, rules that mimic the “intelligence” of nature. In my experience, these are faster, more robust to difficulties and have a high probability of finding the global.
There is uncertainty associated with coefficient values in models, in environmental influences, and in economic parameter values. Traditional optimization pretends all is absolutely known, that the objective function is deterministic, and it results in one certain solution. But there is always uncertainty. Consider, scheduling production over the next month to meet sales projections for the quarter. If the projection exceeds what actually happens in the market, then you make too much. If the projection underestimates demand, then you did not make enough. Uncertainty leads to a range of solution values that are all valid. I think that including uncertainty in the optimization to determine the most probable solution or to avoid possible disasters is important. This approach leads to a stochastic function, and cannot use deterministic optimization methods. Multi-player direct search algorithms, such as leapfrogging, can cope with uncertainty.
Greg: For my Master’s thesis on Dynamic modeling if pH systems I used a Fibonacci search to find the pH that would satisfy the charge balance for a complex mixture of acids, bases and conjugate salts that could not be directly solved. As I was finishing the thesis, I realized you could simply use interval halving where you would compute the excess charge from an execution of the equations and then do the next execution with the interval being previously halved interval in the direction to approach zero excess charge. For a 10 pH interval, you could find the pH solution with 0.01 pH accuracy in 10 executions, which takes a negligible time in terms of controller execution rate. This enables much more accurate linearization of the process by conversion of the controlled variable from pH to be the X axis of the titration curve.
(10) Not stuck in the middle
(9) Avoid pitfalls and buzzing
(8) Learn to like each other’s likes
(7) Lots of time together is productive and not destructive
(6) Realize you can optimize everything
(5) Appreciate how good feedback and feedforward applies to relationships
(4) Your children want to become process control engineers
(3) Feel like you are not wasting time
(2) Get to buy Russ’s book
(1) Appreciate frogs