[HN Gopher] Markov Chain Monte Carlo Without All the Bullshit (2...
___________________________________________________________________
Markov Chain Monte Carlo Without All the Bullshit (2015)
Author : ibobev
Score : 196 points
Date : 2025-04-16 02:01 UTC (14 hours ago)
(HTM) web link (www.jeremykun.com)
(TXT) w3m dump (www.jeremykun.com)
| gnabgib wrote:
| (2015) Popular in:
|
| 2022 (233 points, 53 comments)
| https://news.ycombinator.com/item?id=33332437
|
| 2016 (125 points, 20 comments)
| https://news.ycombinator.com/item?id=12537043
|
| 2015 (114 points, 17 comments)
| https://news.ycombinator.com/item?id=9331808
| emmelaich wrote:
| Unfortunate that the equation rendering doesn't work in the body
| text.
| johnthesecure wrote:
| I experienced this issue, but it works on another device, and
| after a refresh it worked on the original device. Good luck...
| pixelpoet wrote:
| Works fine here, Firefox on Android
| jokoon wrote:
| Feels too long
|
| I only want the python code
| fedeb95 wrote:
| it's better to understand the theory before putting up some
| faulty production code.
| seanhunter wrote:
| Hmm. I'm not an expert, but some of this seems definitely not to
| be accurate. Some of the "Bullshit" turns out perhaps to be quite
| important.
|
| Take the statement:
|
| > Markov Chain is essentially a fancy name for a random walk on a
| graph
|
| Is that really true? I definitely don't think so. To my
| understanding, a Markov process is a stochastic process that has
| the additional (aka "Markov") property that it is "memoryless".
| That is, to estimate the next state you only need to know the
| state now, not any of the history. It becomes a Markov chain if
| it is a discrete, rather than continuous process.
|
| There are lots of random walks on graphs that satisfy this
| definition. Like say you have a graph and you just specify the
| end points and say "walk 5 nodes at random from this starting
| node, what is the expectation that you end up at a specific end
| node". This could be a Markov process. At any point to estimate
| the state you only need to know the state now.
|
| But lots of random walks on graphs do not have the Markov
| property. For example, say I did the exact same thing as the
| previous example, so I have a random graph and a start and target
| node and I say "Walk n nodes at random from the starting node.
| What's the expectation that at some point you visit the target
| node". Now I have introduced a dependency on the history and my
| process is no longer memoryless. It is a discrete stochastic
| process and it is a random walk on a graph but is not a Markov
| chain.
|
| An example of a Markov and non-Markov processes in real life is
| if I have a European option on a stock I only care about what the
| stock price is at the expiry date. But if I have a barrier option
| or my option has some knock-in/knock-out/autocallable features
| then it has a path dependence because I care about whether at any
| point in its trajectory the price hit the barrier level, not just
| the price at the end. So the price process for the barrier option
| is non-Markov.
| NotAnOtter wrote:
| Your overall point might be correct but your example does not
| prove your point.
|
| A Makrov chain is just the path taken through the course of a
| markov process. The terms 'chain' and 'process' are sometimes
| conflated in this context, but this is the most common
| distinction I've seen. As such you can run a markov process for
| some number of steps N times, and then ask how many generated
| chains contain the property you are interested in. The process
| is memoryless but the chain is the result of the process and
| therefor contains memory.
|
| I agree 'Random walk' is a superset of 'a markov process', but
| IMO when someone says Random walk - they normally make
| assumptions that qualify it as a markov chain. Therefor it's
| useful as a teaching to just call it a random walk.
| seanhunter wrote:
| Interesting. I'd not heard the term Markov Chain used to
| describe a path so I checked my copy of "All of Statistics"
| by Larry Wasserman and he (slightly) disagrees with both of
| us and says "A Markov chain is a stochastic
| process for which the distribution of X_t depends only on
| X_{t-1}."
|
| So he doesn't need it to be a discrete process but he also
| doesn't think it's a path. I guess the terminology is not
| 100% standardized. But anyhow thanks for making me think
| about this. Always interesting.
| jmaker wrote:
| A Markov chain is commonly understood to be a time-discrete
| Markov process. Intuitively, it's a "chain" because you can
| "single out" its states in time rather than intervals.
| That's also Markov's original definition. Instead of
| Wassermann one can look up the notion in Wikipedia. A path
| is a notion relevant to the state space of a process - it's
| a realization of states at every single point in time for
| the time span of interest.
| JohnKemeny wrote:
| But a random walk is precisely a stochastic process for
| which the _next state_ depends only on the _current state_.
| In terms of graphs (where _random walk_ comes from), the
| next node is decided by randomly selecting a neighbor of
| the current node.
| seanhunter wrote:
| That's not true for random walks in general I don't
| think. A random walk is a process derived from taking
| random steps in some mathematical space. It can include
| jumps and it can include memory.
|
| Take a "path-avoiding" random walk. At time t the
| distribution of the next step depends on whether or not I
| have at some point hit any of the adjacent nodes in the
| current path. That's not the current state, that's
| memory.
| JohnKemeny wrote:
| A random walk on a graph is a stochastic process that
| starts at a vertex and, at each step, moves to a randomly
| chosen neighboring vertex. Formally:
|
| Given a graph, a random walk is a sequence of vertices
| [v1, v2, ..., vk] such that each v{i+1} is selected
| uniformly at random from the neighbors of vi.
|
| In weighted graphs, the next vertex is chosen with
| probability proportional to edge weights.
| seanhunter wrote:
| I'm pretty sure it's not a requirement that the
| distribution is uniform and also not path-dependent as
| per the example I gave - a random walk where you're not
| allowed to visit a node more than once.
| JohnKemeny wrote:
| Here's a reference for the definition I'm using:
|
| https://www.cs.yale.edu/homes/spielman/561/lect10-18.pdf
|
| It's from lecture notes (pdf) from a course in Spectral
| Graph Theory by Professor Daniel Spielman at Yale.
| seanhunter wrote:
| Those notes look interesting thanks. I've really never
| heard of someone saying you have to have uniform
| probability for a random walk on a graph. In fact the
| context where I'm most familiar with them (lattice/grid
| pricers) it's always specified something like "the
| probability of branch a is p and b is 1-p" (ie explicitly
| not uniform) and they aren't a weighted graph in the
| normal sense.
| srean wrote:
| It doesn't have to be uniform or memory less.
|
| However, in general when people mention random walks
| _without further qualifications_ they are usually talking
| about the uniform and memoryless case.
|
| That vanilla case can be generalized extensively, on
| kinds of state spaces, kinds of index sets, kinds of
| dependence and so on.
| vladimirralev wrote:
| But also "the memory" of the random walk can be encoded
| in a state itself. In your example you can just keep
| accumulating the visited nodes, so your state space will
| be now the space of tuples of nodes from your initial
| space.
| bjornsing wrote:
| Not all Markov processes have stationary distributions, and of
| those that do not all correspond to a non-normalized
| probability function.
|
| It therefore has some merit to think about MCMC as a random
| walk on a graph rather than Markov processes, because the
| "graph" needs to have some properties in order for the Markov
| process to be useful for MCMC. For example every "node" in the
| "graph" needs to be reachable from every other "node"
| (ergodicity).
| seanhunter wrote:
| Could you explain further please? I agree with what you're
| saying but I don't' understand how it applies to what I said
| so there's definitely something I could learn here.
|
| Edit: Thanks.
| bjornsing wrote:
| Sure. As I see MCMC it's basically a mathematical trick
| that lets you sample from probability distributions even if
| you only know the relative probability of different
| samples. It's based on the observation that _some_ Markov
| processes have a stationary distribution that is identical
| to a distribution you may want to sample from. But you need
| to carefully construct the Markov process for that
| observation to hold, and the properties you need to ensure
| are most easily understood as properties of a random walk
| graph. So the interesting subset of Markov processes are
| most easily understood as such random walks on a graph.
| hazymemory wrote:
| I think the core of the subject is that you only need to
| know relative probabilities, p_j/p_k to be able to take a
| sample of a distribution.
| low_tech_love wrote:
| I won't pretend to know the technical details (as the other
| replies do) but I want to make a point for the "pedagogical"
| effect here, which I agree with the author. The way I interpret
| the article, it's not supposed to be a deep, theoretical
| treatise on the subject; more of an introductory, "intuitive"
| take on it. This works for those who need to either learn the
| concept to begin with, or refresh their memories if they don't
| work with it every day. I think it's a given that any intuitive
| take on a mathematical concept will always oversimplify things,
| with the underlying assumption that, if you actually need to
| know more, you're going to have to dive deeper somewhere else.
| The most important thing I think is to help the reader build a
| rough conceptual understanding of the concept such that they
| can reason about it, instead of simply memorizing the terms.
| majoe wrote:
| I see, where you're coming from, but in that particular case,
| the "intuitive" explanation (walk on a graph) is far less
| intuitive for me than the proper explanation, that a Markov
| process is memoryless. That said, I used MCMC in the past to
| do physics simulation, where the Markov property also applies
| to the underlying physical process. So maybe it's just me.
| spookie wrote:
| No single explanation works for most. That's why you need
| multiple ways to explain things and how the standard
| education system fails at disseminating information.
| sylware wrote:
| Yep, and that's shooting in the foot the "people" (I stay
| polite) trying to hide this intuitive "pedagogical"
| perspective for some agenda of their own...
| graycat wrote:
| Stochastic process with the Markov property: Past and future
| are conditionally independent given the present. The general
| version of _conditionally independent_ is from probability
| theory based on measure theory and the Radon-Nikodym theorem
| (with von Neumann 's novel proof in Rudin, _Real and Complex
| Analysis_ ), but an easier introduction is in Erhan Cinlar,
| _Introduction to Stochastic Processes_.
|
| In a _Poisson process_ the time until the next event has the
| Poisson distribution and, thus, from a simple calculus
| manipulation, is a Markov process.
|
| E.g., time from now until a molecule of hydrogen peroxide H2O2
| decomposes to water and oxygen is independent of when it was
| made from water and oxygen. That is the basis of _half life_ ,
| the same distribution until decomposition starting now no
| matter when the chemical or particle was created.
|
| In WWII, searching at sea, e.g., for enemy submarines, was
| important, and then was Bernard O. Koopman, _Search and
| Screening_ , 1946 with an argument that time to an encounter
| between two ships had a Poisson distribution, i.e., was a
| Markov process.
|
| In grad school, there was a question about how long US
| submarines would last in war at sea. Well, take n Red ships and
| m Blue ships with, for each ship, position, speed, and
| detection radius and, for each Red-Blue pair, given a
| detection, probabilities of Red dies, Blue dies, both die,
| neither die (right, these four have to be non-negative and add
| to 1). Now have specified a Markov process that can evaluate
| with a relatively simple Monte-Carlo simulation.
|
| Had written a random number generator in assembler using an Oak
| Ridge formula, typed quickly, and did the simulation. Had a
| review by a famous probability prof and passed when explained
| how the _law of large numbers_ applied. So, some pure and
| applied math and computing worked, but some politics didn 't!
| seanhunter wrote:
| I once failed an interview with a hedge fund because they
| asked me a variant on that red ships/blue ships problem and
| at the time I knew nothing about probability. They also
| hadn't given me lunch which didn't help.
| nivertech wrote:
| This Red/Blue submarine problem seems to be a better fit for
| ABM simulation, rather than Monte Carlo based on Markov
| processes.
|
| IRL this will be a path dependent since both sides will learn
| from the past actions and probabilities will be changing,
| i.e. the memorylessness Markov property will not hold.
|
| In ABM the ships (agents) can move on 2D space, which makes
| detection easier.
|
| Also, obviously there are lots of externalities, like
| weapons, food, and sailors supply, ceasfires, surrenders,
| politics, etc.
|
| All of the above is easier to simulate using ABM, rather than
| Monte Carlo.
| jvanderbot wrote:
| Markov property is about state, yes, but can't you expand state
| to accommodate your non Markov example?
|
| As in, the state you track is no longer the probability that
| you have ended at a given node at time T, but instead includes
| a new vector of the probability you have visited the node at
| any time in the past (which can be obtained from PDF of
| location from previous time step + stochastic "diffusion").
|
| So, we're randomly walking over a graph, but the graph is _not_
| the same as the graph used in MCMC. The MCMC graph is the state
| with random transitions that must model what you want to
| observe. That separation does not violate the statement that
| "it's just a random walk" it just severely complicates it I
| suppose.
| whatever1 wrote:
| This is very typical in reinforcement learning. You just
| expand the state to include some more time periods. It
| definitely raises some academic eyebrows (since it's not
| technically memory less), but hey if it works, it works
| jvanderbot wrote:
| It _is_ memoryless just in a different state space.
| whatever1 wrote:
| In the state space that includes all of the time periods,
| with infinitesimal granularity since the birth of the
| universe.
| jvanderbot wrote:
| Welcome to a well formulated POMDP.
| low_tech_love wrote:
| Interesting! I also make extensive notes of mathematical and
| computational concepts (and "buzzwords") with a "no bullshit"
| title, and it works great. It's great for quick refreshers once
| in a while.
| zero_k wrote:
| Science communication is so important. I write scientific papers
| and I always write a blog post about the paper later, because
| nobody understands the scientific paper -- not even the
| scientists. The scientists regularly read my blog instead. The
| "scientific style" has become so obtuse and useless that even the
| professionals read the blog instead. True insanity.
| MichaelDickens wrote:
| What would happen if you wrote the paper like a blog post and
| submitted it to journals? (With some structural changes to meet
| journal requirements, like having an abstract)
| nivertech wrote:
| If I ever write abt "Markov chains without the Math/Jargon/BS"
| I'll use the clip from the "Ten Second Tom" scene from 50 First
| Dates[1] & a host of other Sci Fi movies abt time loops[2,3] to
| illustrate the Memorylessness[4] Markov property[5]
|
| ---
|
| 1. https://www.youtube.com/watch?v=iN_BDcKhtWk
|
| 2. https://en.wikipedia.org/wiki/Time_loop
|
| 3.
| https://en.wikipedia.org/wiki/List_of_films_featuring_time_l...
|
| 4. https://en.wikipedia.org/wiki/Memorylessness
|
| 5. https://en.wikipedia.org/wiki/Markov_property
| or_am_i wrote:
| > The "bullshit" here is the implicit claim of an author that
| such jargon is needed. Maybe it is to explain advanced
| applications (like attempts to do "inference in Bayesian
| networks"), but it is certainly not needed to define or analyze
| the basic ideas.
|
| "The bullshit here is the implicit claim of an author that German
| language is needed. Maybe it is for advanced applications (like
| Goethe's poetry), but it is certainly not needed to describe
| basic ideas."
|
| (proceeds to explain the same basic concepts 10x more verbose
| than in any math textbook on the subject)
|
| Math/statistics nomenclature is certainly not perfect (think of
| it as a general utilities library that has been in active
| development for 200+ years), but it is widely used for a reason:
| once you learn the language it becomes second nature (very much
| the same as knowing all the details of the standard library API
| in your language of choice) allowing to communicate arbitrarily
| complex abstract ideas with speed and precision.
| Scea91 wrote:
| Yes, its pretty simple. Different audiences prefer different
| style. Experts prefer brevity and jargon, they don't need
| explanations of concepts they already understand. Beginners
| prefer less jargon and inline explanations as they'd need to
| hunt for references otherwise to understand the text.
| artofpongfu wrote:
| So he sets up a toy problem (drawing from a baby names
| distribution), then never explains how to solve this problem.
|
| The intuition is, you set up a graph where the vertices are
| names, and the edges are based on name similarity. Two names are
| neighbors if e.g. their edit distance is within some limit. You
| start at a random name, then at the neighbors, flip a biased coin
| with the ratio of the P(x) of your current name and the neighbor,
| if heads you move to the neighbor.
|
| I'm sure this is wrong in many and subtle ways, but when I read
| an article like this I expect some intuition like this to be
| imparted.
| fidotron wrote:
| There is definite truth to the idea stats people have a habit of
| writing deliberately impenetrable prose. (Probably due to
| proximity to economists and political scientists).
|
| I happened to need to implement a markov chain for playing rock,
| paper, scissors in https://luduxia.com/showdown - once you
| actually understand it the code is short and took no more than 20
| minutes to write, and was by far the easiest part of the entire
| effort, which was surprising. Vertically centering a div is
| harder, and involves dealing with even more jargon.
| conjectures wrote:
| Common pattern where a bright spark asks, 'why you all so
| complicated?' Proceeds to assume we're dealing with a finite
| graph / set.
|
| All the complication is needed to handle the fact that the state
| can be a random vector of real numbers in a possibly varying
| dimensional space. It's not jerking off on jargon for its own
| sake.
|
| Sure, there are simple cases - doesn't make the general case
| 'bullshit'.
___________________________________________________________________
(page generated 2025-04-16 17:01 UTC)