[HN Gopher] Decision Making Under Uncertainty with POMDPs.jl
       ___________________________________________________________________
        
       Decision Making Under Uncertainty with POMDPs.jl
        
       Author : amkkma
       Score  : 79 points
       Date   : 2021-09-20 12:39 UTC (10 hours ago)
        
 (HTM) web link (juliaacademy.com)
 (TXT) w3m dump (juliaacademy.com)
        
       | graycat wrote:
       | My Ph.D. dissertation was in "Decision Making Under Uncertainty".
       | The topics in the OP are extensive with most of them looking
       | good. E.g., I noticed interpolation -- an old idea but sometimes
       | important. Apparently missing, however, was the old technique of
       | _scenario aggregation_.
       | 
       | Before going very far on this subject, we should note there was a
       | lot of attention to this subject at the ORFE department at
       | Princeton.
       | 
       | Some of the related advanced math is pretty, but for practice
       | here are some of the real issues:
       | 
       | (1) Far too easily, needed computer resources, both CPU cycles
       | and memory sizes, especially main memory, are high beyond reason,
       | feasibility, practicality. Without exaggeration, this subject is
       | Godzilla for any super computer -- right away gobbles up and
       | spits out -- of less than 100 million square feet of floor space.
       | 
       | (2) Too often this methodology is what would do -- provably best
       | possible for any means of processing the input data -- if had a
       | lot of data and assumptions about that data that don't have.
       | 
       | (3) So, in practice, the first obstacle is to find a problem that
       | gets around (1) and (2) and is doable otherwise.
       | 
       | (4) In practice, the next obstacle is convincing the _suits_ ,
       | who are nearly always wildly afraid of anything so new to them,
       | to devote any time, money, data, or attention to the work. The
       | _suits_ too easily see any such project as a lose-lose for them:
       | (A) If the project works and with quite valuable results, the
       | project head might get promoted, maybe over the relevant _suits_.
       | (B) If the project doesn 't work really well, generally it will
       | be seen as a waste with the _suits_ having a black mark on their
       | records, a black mark that makes them easy targets in the C-suite
       | competition.
       | 
       | One possible path: If have a project that passes (1) and (2), in
       | addition have one more attribute: Have the project one (A) can
       | fund and do oneself, (B) have a good list of good candidate
       | customers, and (C) do a corresponding startup. Due to (B), are
       | just a vendor to the customer organization and, thus, not a
       | threat to the C-suite _suits_. And to the customer, the project
       | has no  "waste" since if the customer is not happy the startup
       | doesn't get paid.
       | 
       | Net, do a startup.
        
         | ruste wrote:
         | Any recommendations for introductory reading in this area?
        
           | esfandia wrote:
           | Reasoning under uncertainty isn't the sole focus of the book,
           | but Russell & Norvig's AI textbook has a couple of chapters
           | on the topic that are pretty good, with the right amount of
           | detail for the uninitiated.
        
           | graycat wrote:
           | The basic idea is dirt simple. After that, apparently the OP
           | has an overview. After that, the math gets quite _pure_.
           | E.g., the Markov assumption is that in the underlying
           | process, the past and future are conditionally independent
           | given the present. Fairly quickly start discussing this
           | assumption in terms of sigma algebras from measure theory
           | instead of just random variables. The sigma algebras are
           | pretty: (a) If have the sigma algebra generated by a random
           | variable, the random variable is determined (from memory from
           | 20 years ago; my source was
           | 
           | I. I. Gihman and A. V. Skorohod, _The Theory of Stochastic
           | Processes I_ , ISBN 0-387-06573-3, Springer-Verlag, New York,
           | 1974.
           | 
           | and I lost my copy in a move and have yet to order another
           | from Springer). (b) If have infinitely many random variables,
           | discussing conditional independence is at least clumsy but
           | the sigma algebra they generate makes talking about
           | conditional independence easy. Where to get infinitely many
           | random variables? In the continuous time case, the past one
           | second will do. In the discrete time case, the full past of
           | the process if assume the process goes back all the way in
           | time.
           | 
           | But, sure, in practice, can't do anything very direct with
           | sigma algebras, that is, they are pretty theory.
           | 
           | Here is the whole idea in a reasonably general case in brief:
           | 
           | Do some financial planning for the next five years. Make time
           | discrete, say, one point in time for each month. Have a point
           | at the beginning and also at the end for a total of 61
           | points.
           | 
           | Build a spreadsheet with one column for each of the 61 points
           | in time and one row for each of the relevant variables. To
           | make this _stochastic optimal control_ , Markov decision
           | theory, or other equivalent names, in the spreadsheet have
           | some cells that get their values from the built-in random
           | number generator.
           | 
           | Also at each of the first 60 points in time, have some
           | variables that are the _decisions_ get to make.
           | 
           | In the last column, have the value of the business then.
           | 
           | Our mission, and we have to accept it, is to determine the
           | decisions (values in the decision cells) that maximize the
           | value of the business at the end. Ah, since we have the
           | random cells, typically we want to maximize the expected
           | (average) value at the end.
           | 
           | So, we start at column 60 and consider all the possible data
           | we could have in column 60. Sure, that data fills maybe a few
           | exabytes. Then we pick values for the decisions in column 60
           | and reevaluate the spreadsheet many times letting the random
           | cells generate random numbers, average, and see what average
           | value we get in column 61. We do this for all possible
           | decisions. Sure, now we are keeping busy a few million square
           | feet of computing. But the work is highly parallel. That's
           | what we would about have to do using a general purpose
           | program, but in practice we would look at the specific
           | problem, note _special structure_ (properties, assumptions,
           | simplifications), and get some speedups.
           | 
           | To be more clear, at column 60 set aside the random cells and
           | the decision cells -- the rest have the _state_ at column 60.
           | So, for each possible state, we try all the possible
           | decisions, and for each decision we reevaluate many times and
           | get the expected value of the business in column 61 and pick
           | the best decision. So, there in column 60, for each of the
           | possible states, we get the best decision and the
           | corresponding expected business value in column 61. We store
           | this -- that is, we store the best decisions and expected
           | value for each state. Here we fill some exabytes of storage.
           | 
           | Now we back up to column 59. For each state we try all the
           | possible decisions. For each of those decisions we
           | recalculate, that is, exercise the random number generators,
           | and see what state we get in column 60 and the corresponding
           | business value in column 61. So, for each state in column 59
           | we find the best decisions.
           | 
           | Now we continue in this way back to columns 58, 57, ..., 1.
           | In column 1 we know the state we are in because that is our
           | present. So our calculation for column 1 told us what
           | decision to make there in column 1.
           | 
           | Then in reality, soon we get to column 2, see what state we
           | are in, and use the decision we tabulated for that state. We
           | continue with column 3, 4, ..., 60, make our decisions in
           | column 60, and see what business value we get in column 61.
           | 
           | The whole thing is the same except much faster if we have no
           | random cells.
           | 
           | There are also continuous time treatments, essentially the
           | same intuitively.
           | 
           | For some books,
           | 
           | Dimitri P. Bertsekas, _Dynamic Programming: Deterministic and
           | Stochastic Models._
           | 
           | George L. Nemhauser, _Dynamic Programming._
           | 
           | Stuart E. Dreyfus and Averill M. Law, _The Art and Theory of
           | Dynamic Programming._
           | 
           | E. B. Dynkin and A. A. Yushkevich, _Controlled Markov
           | Processes._
           | 
           | Wendell H. Fleming and Raymond W. Rishel, _Deterministic and
           | Stochastic Optimal Control._
           | 
           | There is more from R. T. Rockafellar and R. J.-B. Wets.
           | 
           | And of course we should mention R. Bellman, the father of
           | this subject.
        
         | infogulch wrote:
         | Care to share your dissertation?
         | 
         | If I'm guessing correctly what _scenario aggregation_ is by
         | just the name, I 've found that technique is very powerful when
         | doing that with plain old human-based decision making.
         | Basically if you have a bunch of complex scenarios that you're
         | trying to consider at once it can be very difficult to make
         | progress because you have to compare whole trees of options
         | against each other along with their probability every time you
         | make a change anywhere. But if several scenarios have some
         | _prefix_ where they are basically equivalent (or can be made to
         | be), you can bundle them together and consider the group as a
         | whole, putting off deciding between them until later. Sometimes
         | this can collapse a huge complex decision tree into the one
         | obviously correct thing to do _right now_ , until the point
         | where they diverge again.
        
           | graycat wrote:
           | Good, fast view of scenario aggregation. The idea is
           | powerful, not just a heuristic, and from R. T. Rockafellar.
           | 
           | I didn't want to publish my dissertation because that would
           | be like giving it away for free. Instead I wanted to sell it!
           | I made some efforts to sell it but ran into the problem I
           | mentioned for the suits. So, it didn't sell. But I still
           | don't want to give it away for free.
        
       ___________________________________________________________________
       (page generated 2021-09-20 23:02 UTC)