Manuel Brenner
1 min readJan 13, 2022

--

Thanks for your feedback, Travis!

1) sounds like a reasonable assumption to me. I'm not sure how model free RL models are initialized but if there is a uniform distribution of policy that would correspond to an exploratory phase with maximum entropy in the distribution of possible policies (so essentially a random walk).

2) Interesting point that nicely ties to historical dimension of Markov's discovery. He derived his theory as a means to counter arguments by Nekrasov, a Russian orthodox statistician that tried to proof free will (more details e.g. here https://www.americanscientist.org/article/first-links-in-the-markov-chain, and I originally heard about it in Jordan Ellenberg's podcast on Mindscape https://www.youtube.com/watch?v=QN7LyhPZrLY)

3) I would need to look into this, frankly I don't know anything about Bellman's optimality principle. However, in physics we actually constantly assume 1st order Markov processes. Dynamical systems (e.g. pendulum swings, movements of planets, fluid dynamics) are all completely determined if you know the initial state and the dynamical equations, so you don't need any info about the past to determine the future. In my field of dynamical systems reconstruction, we also frequently make this model assumption. It however breaks down when you have incomplete information about the system.

--

--

Manuel Brenner
Manuel Brenner

Written by Manuel Brenner

Postdoctoral researcher in AI, neuroscience and dynamical systems. Connect via LinkedIn: https://www.linkedin.com/in/manuel-brenner-772261191

No responses yet