[HN Gopher] Decision Making Under Uncertainty with POMDPs.jl
___________________________________________________________________
Decision Making Under Uncertainty with POMDPs.jl
Author : amkkma
Score : 79 points
Date : 2021-09-20 12:39 UTC (10 hours ago)
(HTM) web link (juliaacademy.com)
(TXT) w3m dump (juliaacademy.com)
| graycat wrote:
| My Ph.D. dissertation was in "Decision Making Under Uncertainty".
| The topics in the OP are extensive with most of them looking
| good. E.g., I noticed interpolation -- an old idea but sometimes
| important. Apparently missing, however, was the old technique of
| _scenario aggregation_.
|
| Before going very far on this subject, we should note there was a
| lot of attention to this subject at the ORFE department at
| Princeton.
|
| Some of the related advanced math is pretty, but for practice
| here are some of the real issues:
|
| (1) Far too easily, needed computer resources, both CPU cycles
| and memory sizes, especially main memory, are high beyond reason,
| feasibility, practicality. Without exaggeration, this subject is
| Godzilla for any super computer -- right away gobbles up and
| spits out -- of less than 100 million square feet of floor space.
|
| (2) Too often this methodology is what would do -- provably best
| possible for any means of processing the input data -- if had a
| lot of data and assumptions about that data that don't have.
|
| (3) So, in practice, the first obstacle is to find a problem that
| gets around (1) and (2) and is doable otherwise.
|
| (4) In practice, the next obstacle is convincing the _suits_ ,
| who are nearly always wildly afraid of anything so new to them,
| to devote any time, money, data, or attention to the work. The
| _suits_ too easily see any such project as a lose-lose for them:
| (A) If the project works and with quite valuable results, the
| project head might get promoted, maybe over the relevant _suits_.
| (B) If the project doesn 't work really well, generally it will
| be seen as a waste with the _suits_ having a black mark on their
| records, a black mark that makes them easy targets in the C-suite
| competition.
|
| One possible path: If have a project that passes (1) and (2), in
| addition have one more attribute: Have the project one (A) can
| fund and do oneself, (B) have a good list of good candidate
| customers, and (C) do a corresponding startup. Due to (B), are
| just a vendor to the customer organization and, thus, not a
| threat to the C-suite _suits_. And to the customer, the project
| has no "waste" since if the customer is not happy the startup
| doesn't get paid.
|
| Net, do a startup.
| ruste wrote:
| Any recommendations for introductory reading in this area?
| esfandia wrote:
| Reasoning under uncertainty isn't the sole focus of the book,
| but Russell & Norvig's AI textbook has a couple of chapters
| on the topic that are pretty good, with the right amount of
| detail for the uninitiated.
| graycat wrote:
| The basic idea is dirt simple. After that, apparently the OP
| has an overview. After that, the math gets quite _pure_.
| E.g., the Markov assumption is that in the underlying
| process, the past and future are conditionally independent
| given the present. Fairly quickly start discussing this
| assumption in terms of sigma algebras from measure theory
| instead of just random variables. The sigma algebras are
| pretty: (a) If have the sigma algebra generated by a random
| variable, the random variable is determined (from memory from
| 20 years ago; my source was
|
| I. I. Gihman and A. V. Skorohod, _The Theory of Stochastic
| Processes I_ , ISBN 0-387-06573-3, Springer-Verlag, New York,
| 1974.
|
| and I lost my copy in a move and have yet to order another
| from Springer). (b) If have infinitely many random variables,
| discussing conditional independence is at least clumsy but
| the sigma algebra they generate makes talking about
| conditional independence easy. Where to get infinitely many
| random variables? In the continuous time case, the past one
| second will do. In the discrete time case, the full past of
| the process if assume the process goes back all the way in
| time.
|
| But, sure, in practice, can't do anything very direct with
| sigma algebras, that is, they are pretty theory.
|
| Here is the whole idea in a reasonably general case in brief:
|
| Do some financial planning for the next five years. Make time
| discrete, say, one point in time for each month. Have a point
| at the beginning and also at the end for a total of 61
| points.
|
| Build a spreadsheet with one column for each of the 61 points
| in time and one row for each of the relevant variables. To
| make this _stochastic optimal control_ , Markov decision
| theory, or other equivalent names, in the spreadsheet have
| some cells that get their values from the built-in random
| number generator.
|
| Also at each of the first 60 points in time, have some
| variables that are the _decisions_ get to make.
|
| In the last column, have the value of the business then.
|
| Our mission, and we have to accept it, is to determine the
| decisions (values in the decision cells) that maximize the
| value of the business at the end. Ah, since we have the
| random cells, typically we want to maximize the expected
| (average) value at the end.
|
| So, we start at column 60 and consider all the possible data
| we could have in column 60. Sure, that data fills maybe a few
| exabytes. Then we pick values for the decisions in column 60
| and reevaluate the spreadsheet many times letting the random
| cells generate random numbers, average, and see what average
| value we get in column 61. We do this for all possible
| decisions. Sure, now we are keeping busy a few million square
| feet of computing. But the work is highly parallel. That's
| what we would about have to do using a general purpose
| program, but in practice we would look at the specific
| problem, note _special structure_ (properties, assumptions,
| simplifications), and get some speedups.
|
| To be more clear, at column 60 set aside the random cells and
| the decision cells -- the rest have the _state_ at column 60.
| So, for each possible state, we try all the possible
| decisions, and for each decision we reevaluate many times and
| get the expected value of the business in column 61 and pick
| the best decision. So, there in column 60, for each of the
| possible states, we get the best decision and the
| corresponding expected business value in column 61. We store
| this -- that is, we store the best decisions and expected
| value for each state. Here we fill some exabytes of storage.
|
| Now we back up to column 59. For each state we try all the
| possible decisions. For each of those decisions we
| recalculate, that is, exercise the random number generators,
| and see what state we get in column 60 and the corresponding
| business value in column 61. So, for each state in column 59
| we find the best decisions.
|
| Now we continue in this way back to columns 58, 57, ..., 1.
| In column 1 we know the state we are in because that is our
| present. So our calculation for column 1 told us what
| decision to make there in column 1.
|
| Then in reality, soon we get to column 2, see what state we
| are in, and use the decision we tabulated for that state. We
| continue with column 3, 4, ..., 60, make our decisions in
| column 60, and see what business value we get in column 61.
|
| The whole thing is the same except much faster if we have no
| random cells.
|
| There are also continuous time treatments, essentially the
| same intuitively.
|
| For some books,
|
| Dimitri P. Bertsekas, _Dynamic Programming: Deterministic and
| Stochastic Models._
|
| George L. Nemhauser, _Dynamic Programming._
|
| Stuart E. Dreyfus and Averill M. Law, _The Art and Theory of
| Dynamic Programming._
|
| E. B. Dynkin and A. A. Yushkevich, _Controlled Markov
| Processes._
|
| Wendell H. Fleming and Raymond W. Rishel, _Deterministic and
| Stochastic Optimal Control._
|
| There is more from R. T. Rockafellar and R. J.-B. Wets.
|
| And of course we should mention R. Bellman, the father of
| this subject.
| infogulch wrote:
| Care to share your dissertation?
|
| If I'm guessing correctly what _scenario aggregation_ is by
| just the name, I 've found that technique is very powerful when
| doing that with plain old human-based decision making.
| Basically if you have a bunch of complex scenarios that you're
| trying to consider at once it can be very difficult to make
| progress because you have to compare whole trees of options
| against each other along with their probability every time you
| make a change anywhere. But if several scenarios have some
| _prefix_ where they are basically equivalent (or can be made to
| be), you can bundle them together and consider the group as a
| whole, putting off deciding between them until later. Sometimes
| this can collapse a huge complex decision tree into the one
| obviously correct thing to do _right now_ , until the point
| where they diverge again.
| graycat wrote:
| Good, fast view of scenario aggregation. The idea is
| powerful, not just a heuristic, and from R. T. Rockafellar.
|
| I didn't want to publish my dissertation because that would
| be like giving it away for free. Instead I wanted to sell it!
| I made some efforts to sell it but ran into the problem I
| mentioned for the suits. So, it didn't sell. But I still
| don't want to give it away for free.
___________________________________________________________________
(page generated 2021-09-20 23:02 UTC)