The Mathematics of Finding Your Soulmate

Manuel Brenner
6 min readDec 2, 2021

Keep on with the force don’t stop
Don’t stop ’til you get enough.
- Michael Jackson

The world is a rich place of opportunities waiting to be discovered, of actions to be taken and choices to be made.

We mere mortals have to navigate this world with partial and biased information and take decisions under uncertainty. Add to that the issue that many decisions have to be taken quick and dirty, without too much time to consider them in-depth and without having the luxury of looking at them from all sides. This is especially true if we have to act in an environment full of competitors, all pursuing their own self-interests and all struggling with similar uncertainties: be it when looking for an apartment in a popular city, looking for the dream job, or waiting until you finally meet your soulmate and get to live happily ever after.

In some sense, many of our strategies in the world can be summed up by a tradeoff between exploration and exploitation. The unknown can both offer threats and rewards. When we settle on a path through reality, we are always at the same time discarding many unexplored possibilities.

This is the tragedy of having all of our dreams and imaginations collapse into just one lived reality: when we are children, we are roaming around, free to discover and free to imagine many different versions of ourselves. After we grow up, we have to deal with the life we chose, for better or worse.

Mathematicians and computer scientists have been dealing with very similar questions, and have been working on formalizing them for decades. As an example, in the branch of artificial intelligence called reinforcement learning, the exploration-exploitation trade-off is a well-known problem.

Photo by Carl Raw on Unsplash

A simple example to illustrate this is the multi-armed bandit, endowed with K levers, all offering different rewards with different probabilities.

Say you are faced with this machine and want to make as much money as possible. At every point in time, you can pull one of K levers, and after each lever-pull, you receive a reward based on a probability distribution unknown to you (we assume that you don’t know the probabilities of the individual arms for giving you rewards, but they stay constant in time). Given you have a…