[HN Gopher] Markov Chain Monte Carlo Without All the Bullshit (2...
       ___________________________________________________________________
        
       Markov Chain Monte Carlo Without All the Bullshit (2015)
        
       Author : ibobev
       Score  : 196 points
       Date   : 2025-04-16 02:01 UTC (14 hours ago)
        
 (HTM) web link (www.jeremykun.com)
 (TXT) w3m dump (www.jeremykun.com)
        
       | gnabgib wrote:
       | (2015) Popular in:
       | 
       | 2022 (233 points, 53 comments)
       | https://news.ycombinator.com/item?id=33332437
       | 
       | 2016 (125 points, 20 comments)
       | https://news.ycombinator.com/item?id=12537043
       | 
       | 2015 (114 points, 17 comments)
       | https://news.ycombinator.com/item?id=9331808
        
       | emmelaich wrote:
       | Unfortunate that the equation rendering doesn't work in the body
       | text.
        
         | johnthesecure wrote:
         | I experienced this issue, but it works on another device, and
         | after a refresh it worked on the original device. Good luck...
        
         | pixelpoet wrote:
         | Works fine here, Firefox on Android
        
       | jokoon wrote:
       | Feels too long
       | 
       | I only want the python code
        
         | fedeb95 wrote:
         | it's better to understand the theory before putting up some
         | faulty production code.
        
       | seanhunter wrote:
       | Hmm. I'm not an expert, but some of this seems definitely not to
       | be accurate. Some of the "Bullshit" turns out perhaps to be quite
       | important.
       | 
       | Take the statement:
       | 
       | > Markov Chain is essentially a fancy name for a random walk on a
       | graph
       | 
       | Is that really true? I definitely don't think so. To my
       | understanding, a Markov process is a stochastic process that has
       | the additional (aka "Markov") property that it is "memoryless".
       | That is, to estimate the next state you only need to know the
       | state now, not any of the history. It becomes a Markov chain if
       | it is a discrete, rather than continuous process.
       | 
       | There are lots of random walks on graphs that satisfy this
       | definition. Like say you have a graph and you just specify the
       | end points and say "walk 5 nodes at random from this starting
       | node, what is the expectation that you end up at a specific end
       | node". This could be a Markov process. At any point to estimate
       | the state you only need to know the state now.
       | 
       | But lots of random walks on graphs do not have the Markov
       | property. For example, say I did the exact same thing as the
       | previous example, so I have a random graph and a start and target
       | node and I say "Walk n nodes at random from the starting node.
       | What's the expectation that at some point you visit the target
       | node". Now I have introduced a dependency on the history and my
       | process is no longer memoryless. It is a discrete stochastic
       | process and it is a random walk on a graph but is not a Markov
       | chain.
       | 
       | An example of a Markov and non-Markov processes in real life is
       | if I have a European option on a stock I only care about what the
       | stock price is at the expiry date. But if I have a barrier option
       | or my option has some knock-in/knock-out/autocallable features
       | then it has a path dependence because I care about whether at any
       | point in its trajectory the price hit the barrier level, not just
       | the price at the end. So the price process for the barrier option
       | is non-Markov.
        
         | NotAnOtter wrote:
         | Your overall point might be correct but your example does not
         | prove your point.
         | 
         | A Makrov chain is just the path taken through the course of a
         | markov process. The terms 'chain' and 'process' are sometimes
         | conflated in this context, but this is the most common
         | distinction I've seen. As such you can run a markov process for
         | some number of steps N times, and then ask how many generated
         | chains contain the property you are interested in. The process
         | is memoryless but the chain is the result of the process and
         | therefor contains memory.
         | 
         | I agree 'Random walk' is a superset of 'a markov process', but
         | IMO when someone says Random walk - they normally make
         | assumptions that qualify it as a markov chain. Therefor it's
         | useful as a teaching to just call it a random walk.
        
           | seanhunter wrote:
           | Interesting. I'd not heard the term Markov Chain used to
           | describe a path so I checked my copy of "All of Statistics"
           | by Larry Wasserman and he (slightly) disagrees with both of
           | us and says                   "A Markov chain is a stochastic
           | process for which the distribution of X_t depends only on
           | X_{t-1}."
           | 
           | So he doesn't need it to be a discrete process but he also
           | doesn't think it's a path. I guess the terminology is not
           | 100% standardized. But anyhow thanks for making me think
           | about this. Always interesting.
        
             | jmaker wrote:
             | A Markov chain is commonly understood to be a time-discrete
             | Markov process. Intuitively, it's a "chain" because you can
             | "single out" its states in time rather than intervals.
             | That's also Markov's original definition. Instead of
             | Wassermann one can look up the notion in Wikipedia. A path
             | is a notion relevant to the state space of a process - it's
             | a realization of states at every single point in time for
             | the time span of interest.
        
             | JohnKemeny wrote:
             | But a random walk is precisely a stochastic process for
             | which the _next state_ depends only on the _current state_.
             | In terms of graphs (where _random walk_ comes from), the
             | next node is decided by randomly selecting a neighbor of
             | the current node.
        
               | seanhunter wrote:
               | That's not true for random walks in general I don't
               | think. A random walk is a process derived from taking
               | random steps in some mathematical space. It can include
               | jumps and it can include memory.
               | 
               | Take a "path-avoiding" random walk. At time t the
               | distribution of the next step depends on whether or not I
               | have at some point hit any of the adjacent nodes in the
               | current path. That's not the current state, that's
               | memory.
        
               | JohnKemeny wrote:
               | A random walk on a graph is a stochastic process that
               | starts at a vertex and, at each step, moves to a randomly
               | chosen neighboring vertex. Formally:
               | 
               | Given a graph, a random walk is a sequence of vertices
               | [v1, v2, ..., vk] such that each v{i+1} is selected
               | uniformly at random from the neighbors of vi.
               | 
               | In weighted graphs, the next vertex is chosen with
               | probability proportional to edge weights.
        
               | seanhunter wrote:
               | I'm pretty sure it's not a requirement that the
               | distribution is uniform and also not path-dependent as
               | per the example I gave - a random walk where you're not
               | allowed to visit a node more than once.
        
               | JohnKemeny wrote:
               | Here's a reference for the definition I'm using:
               | 
               | https://www.cs.yale.edu/homes/spielman/561/lect10-18.pdf
               | 
               | It's from lecture notes (pdf) from a course in Spectral
               | Graph Theory by Professor Daniel Spielman at Yale.
        
               | seanhunter wrote:
               | Those notes look interesting thanks. I've really never
               | heard of someone saying you have to have uniform
               | probability for a random walk on a graph. In fact the
               | context where I'm most familiar with them (lattice/grid
               | pricers) it's always specified something like "the
               | probability of branch a is p and b is 1-p" (ie explicitly
               | not uniform) and they aren't a weighted graph in the
               | normal sense.
        
               | srean wrote:
               | It doesn't have to be uniform or memory less.
               | 
               | However, in general when people mention random walks
               | _without further qualifications_ they are usually talking
               | about the uniform and memoryless case.
               | 
               | That vanilla case can be generalized extensively, on
               | kinds of state spaces, kinds of index sets, kinds of
               | dependence and so on.
        
               | vladimirralev wrote:
               | But also "the memory" of the random walk can be encoded
               | in a state itself. In your example you can just keep
               | accumulating the visited nodes, so your state space will
               | be now the space of tuples of nodes from your initial
               | space.
        
         | bjornsing wrote:
         | Not all Markov processes have stationary distributions, and of
         | those that do not all correspond to a non-normalized
         | probability function.
         | 
         | It therefore has some merit to think about MCMC as a random
         | walk on a graph rather than Markov processes, because the
         | "graph" needs to have some properties in order for the Markov
         | process to be useful for MCMC. For example every "node" in the
         | "graph" needs to be reachable from every other "node"
         | (ergodicity).
        
           | seanhunter wrote:
           | Could you explain further please? I agree with what you're
           | saying but I don't' understand how it applies to what I said
           | so there's definitely something I could learn here.
           | 
           | Edit: Thanks.
        
             | bjornsing wrote:
             | Sure. As I see MCMC it's basically a mathematical trick
             | that lets you sample from probability distributions even if
             | you only know the relative probability of different
             | samples. It's based on the observation that _some_ Markov
             | processes have a stationary distribution that is identical
             | to a distribution you may want to sample from. But you need
             | to carefully construct the Markov process for that
             | observation to hold, and the properties you need to ensure
             | are most easily understood as properties of a random walk
             | graph. So the interesting subset of Markov processes are
             | most easily understood as such random walks on a graph.
        
               | hazymemory wrote:
               | I think the core of the subject is that you only need to
               | know relative probabilities, p_j/p_k to be able to take a
               | sample of a distribution.
        
         | low_tech_love wrote:
         | I won't pretend to know the technical details (as the other
         | replies do) but I want to make a point for the "pedagogical"
         | effect here, which I agree with the author. The way I interpret
         | the article, it's not supposed to be a deep, theoretical
         | treatise on the subject; more of an introductory, "intuitive"
         | take on it. This works for those who need to either learn the
         | concept to begin with, or refresh their memories if they don't
         | work with it every day. I think it's a given that any intuitive
         | take on a mathematical concept will always oversimplify things,
         | with the underlying assumption that, if you actually need to
         | know more, you're going to have to dive deeper somewhere else.
         | The most important thing I think is to help the reader build a
         | rough conceptual understanding of the concept such that they
         | can reason about it, instead of simply memorizing the terms.
        
           | majoe wrote:
           | I see, where you're coming from, but in that particular case,
           | the "intuitive" explanation (walk on a graph) is far less
           | intuitive for me than the proper explanation, that a Markov
           | process is memoryless. That said, I used MCMC in the past to
           | do physics simulation, where the Markov property also applies
           | to the underlying physical process. So maybe it's just me.
        
             | spookie wrote:
             | No single explanation works for most. That's why you need
             | multiple ways to explain things and how the standard
             | education system fails at disseminating information.
        
           | sylware wrote:
           | Yep, and that's shooting in the foot the "people" (I stay
           | polite) trying to hide this intuitive "pedagogical"
           | perspective for some agenda of their own...
        
         | graycat wrote:
         | Stochastic process with the Markov property: Past and future
         | are conditionally independent given the present. The general
         | version of _conditionally independent_ is from probability
         | theory based on measure theory and the Radon-Nikodym theorem
         | (with von Neumann 's novel proof in Rudin, _Real and Complex
         | Analysis_ ), but an easier introduction is in Erhan Cinlar,
         | _Introduction to Stochastic Processes_.
         | 
         | In a _Poisson process_ the time until the next event has the
         | Poisson distribution and, thus, from a simple calculus
         | manipulation, is a Markov process.
         | 
         | E.g., time from now until a molecule of hydrogen peroxide H2O2
         | decomposes to water and oxygen is independent of when it was
         | made from water and oxygen. That is the basis of _half life_ ,
         | the same distribution until decomposition starting now no
         | matter when the chemical or particle was created.
         | 
         | In WWII, searching at sea, e.g., for enemy submarines, was
         | important, and then was Bernard O. Koopman, _Search and
         | Screening_ , 1946 with an argument that time to an encounter
         | between two ships had a Poisson distribution, i.e., was a
         | Markov process.
         | 
         | In grad school, there was a question about how long US
         | submarines would last in war at sea. Well, take n Red ships and
         | m Blue ships with, for each ship, position, speed, and
         | detection radius and, for each Red-Blue pair, given a
         | detection, probabilities of Red dies, Blue dies, both die,
         | neither die (right, these four have to be non-negative and add
         | to 1). Now have specified a Markov process that can evaluate
         | with a relatively simple Monte-Carlo simulation.
         | 
         | Had written a random number generator in assembler using an Oak
         | Ridge formula, typed quickly, and did the simulation. Had a
         | review by a famous probability prof and passed when explained
         | how the _law of large numbers_ applied. So, some pure and
         | applied math and computing worked, but some politics didn 't!
        
           | seanhunter wrote:
           | I once failed an interview with a hedge fund because they
           | asked me a variant on that red ships/blue ships problem and
           | at the time I knew nothing about probability. They also
           | hadn't given me lunch which didn't help.
        
           | nivertech wrote:
           | This Red/Blue submarine problem seems to be a better fit for
           | ABM simulation, rather than Monte Carlo based on Markov
           | processes.
           | 
           | IRL this will be a path dependent since both sides will learn
           | from the past actions and probabilities will be changing,
           | i.e. the memorylessness Markov property will not hold.
           | 
           | In ABM the ships (agents) can move on 2D space, which makes
           | detection easier.
           | 
           | Also, obviously there are lots of externalities, like
           | weapons, food, and sailors supply, ceasfires, surrenders,
           | politics, etc.
           | 
           | All of the above is easier to simulate using ABM, rather than
           | Monte Carlo.
        
         | jvanderbot wrote:
         | Markov property is about state, yes, but can't you expand state
         | to accommodate your non Markov example?
         | 
         | As in, the state you track is no longer the probability that
         | you have ended at a given node at time T, but instead includes
         | a new vector of the probability you have visited the node at
         | any time in the past (which can be obtained from PDF of
         | location from previous time step + stochastic "diffusion").
         | 
         | So, we're randomly walking over a graph, but the graph is _not_
         | the same as the graph used in MCMC. The MCMC graph is the state
         | with random transitions that must model what you want to
         | observe. That separation does not violate the statement that
         | "it's just a random walk" it just severely complicates it I
         | suppose.
        
           | whatever1 wrote:
           | This is very typical in reinforcement learning. You just
           | expand the state to include some more time periods. It
           | definitely raises some academic eyebrows (since it's not
           | technically memory less), but hey if it works, it works
        
             | jvanderbot wrote:
             | It _is_ memoryless just in a different state space.
        
               | whatever1 wrote:
               | In the state space that includes all of the time periods,
               | with infinitesimal granularity since the birth of the
               | universe.
        
               | jvanderbot wrote:
               | Welcome to a well formulated POMDP.
        
       | low_tech_love wrote:
       | Interesting! I also make extensive notes of mathematical and
       | computational concepts (and "buzzwords") with a "no bullshit"
       | title, and it works great. It's great for quick refreshers once
       | in a while.
        
       | zero_k wrote:
       | Science communication is so important. I write scientific papers
       | and I always write a blog post about the paper later, because
       | nobody understands the scientific paper -- not even the
       | scientists. The scientists regularly read my blog instead. The
       | "scientific style" has become so obtuse and useless that even the
       | professionals read the blog instead. True insanity.
        
         | MichaelDickens wrote:
         | What would happen if you wrote the paper like a blog post and
         | submitted it to journals? (With some structural changes to meet
         | journal requirements, like having an abstract)
        
       | nivertech wrote:
       | If I ever write abt "Markov chains without the Math/Jargon/BS"
       | I'll use the clip from the "Ten Second Tom" scene from 50 First
       | Dates[1] & a host of other Sci Fi movies abt time loops[2,3] to
       | illustrate the Memorylessness[4] Markov property[5]
       | 
       | ---
       | 
       | 1. https://www.youtube.com/watch?v=iN_BDcKhtWk
       | 
       | 2. https://en.wikipedia.org/wiki/Time_loop
       | 
       | 3.
       | https://en.wikipedia.org/wiki/List_of_films_featuring_time_l...
       | 
       | 4. https://en.wikipedia.org/wiki/Memorylessness
       | 
       | 5. https://en.wikipedia.org/wiki/Markov_property
        
       | or_am_i wrote:
       | > The "bullshit" here is the implicit claim of an author that
       | such jargon is needed. Maybe it is to explain advanced
       | applications (like attempts to do "inference in Bayesian
       | networks"), but it is certainly not needed to define or analyze
       | the basic ideas.
       | 
       | "The bullshit here is the implicit claim of an author that German
       | language is needed. Maybe it is for advanced applications (like
       | Goethe's poetry), but it is certainly not needed to describe
       | basic ideas."
       | 
       | (proceeds to explain the same basic concepts 10x more verbose
       | than in any math textbook on the subject)
       | 
       | Math/statistics nomenclature is certainly not perfect (think of
       | it as a general utilities library that has been in active
       | development for 200+ years), but it is widely used for a reason:
       | once you learn the language it becomes second nature (very much
       | the same as knowing all the details of the standard library API
       | in your language of choice) allowing to communicate arbitrarily
       | complex abstract ideas with speed and precision.
        
         | Scea91 wrote:
         | Yes, its pretty simple. Different audiences prefer different
         | style. Experts prefer brevity and jargon, they don't need
         | explanations of concepts they already understand. Beginners
         | prefer less jargon and inline explanations as they'd need to
         | hunt for references otherwise to understand the text.
        
       | artofpongfu wrote:
       | So he sets up a toy problem (drawing from a baby names
       | distribution), then never explains how to solve this problem.
       | 
       | The intuition is, you set up a graph where the vertices are
       | names, and the edges are based on name similarity. Two names are
       | neighbors if e.g. their edit distance is within some limit. You
       | start at a random name, then at the neighbors, flip a biased coin
       | with the ratio of the P(x) of your current name and the neighbor,
       | if heads you move to the neighbor.
       | 
       | I'm sure this is wrong in many and subtle ways, but when I read
       | an article like this I expect some intuition like this to be
       | imparted.
        
       | fidotron wrote:
       | There is definite truth to the idea stats people have a habit of
       | writing deliberately impenetrable prose. (Probably due to
       | proximity to economists and political scientists).
       | 
       | I happened to need to implement a markov chain for playing rock,
       | paper, scissors in https://luduxia.com/showdown - once you
       | actually understand it the code is short and took no more than 20
       | minutes to write, and was by far the easiest part of the entire
       | effort, which was surprising. Vertically centering a div is
       | harder, and involves dealing with even more jargon.
        
       | conjectures wrote:
       | Common pattern where a bright spark asks, 'why you all so
       | complicated?' Proceeds to assume we're dealing with a finite
       | graph / set.
       | 
       | All the complication is needed to handle the fact that the state
       | can be a random vector of real numbers in a possibly varying
       | dimensional space. It's not jerking off on jargon for its own
       | sake.
       | 
       | Sure, there are simple cases - doesn't make the general case
       | 'bullshit'.
        
       ___________________________________________________________________
       (page generated 2025-04-16 17:01 UTC)