01 May 2024

Thoughts on presence bias

The Death Algorithm, Dark Enlightenment, and Capitalist Realism.

The first two mention presence bias.

Presence bias can be considered equivalent to the γ parameter, a decay on future reward, of reinforcement learning. It is a number in the range \([0, 1]\), where a value of \(0\) indicates that future reward has no value at all and \(1\) means that future reward is exactly as valuable as present reward.

The expected reward of action \(a\) in state \(s\) is γ times the expected maximum available reward in the expected state \(s'\) that results from taking action \(a\) in state \(s\). Formally:

\(E(s, a) = \gamma * max_{x \in \{States\}}(E(s(a), x))\)

Where \(E(s, a)\) is the expected reward of taking action \(a\) in state \(s\) and \(s(a)\) is the state resulting from taking action \(a\) in state \(s\). \(max_{x}(f(x))\) means to range across the domain of \(x\) in the function \(f\) and return the maximum value \(f\) can take. The expected value of a state \(s\) can be defined as the value accruing to a policy that always selects the action whose expected reward is the highest. Formally, this reads:

\(\pi^{\star}(s) = argmax_{a \in \{Actions\}}(E(s, a))\)

Where \(\pi^{\star}(s)\) is the action taken by the optimal policy in state \(s\) and \(argmax_{a}(f(x))\) ranges \(a\) across its domain and returns the \(a\) for which \(f(a)\) is the largest. The optimal policy then takes the action each turn whose expected reward is the highest; since the policy is assumed to range over all possible states and have a perfect ability to predict the outcome of actions in each state, will this policy will necessarily accrue the highest possible reward? The answer depends on the value of the parameter γ, which governs the rate at which the importance of reward to the final decision depends on the reward from reaching that state: reward gained from states that take more actions to reach is decreased less than that of nearer states. A higher γ decreases the rate of this deterioration and causes the policy to consider value from more remote states. A lower γ causes the model to only seek reward in nearby states.

It is perfectly reasonable to understand this γ parameter as describing how present-biased the policy is, in the way Land and Simanowski understand it. Their prognosis is that our collective γ is too low and should be raised. I respond that the value of a high γ depends on the accuracy of \(s(a)\). If we cannot accurately predict the outcome of events, it is pointless to forecast too far: the error would become too great.

The first and last mention smartphone zombism.