Wednesday, January 2, 2019

Let's breakdown will like George Ainslie


Breakdown of Will (2001) is a book about breaking down will, that is, dissecting the idea of a "will" and show how will is made from little pieces. The book is too long for me to read because I'm a very busy pony, so I'll just read summarize fro Précis of Breakdown of Will (2005).

BTW, Ainslie's breakdown of will has no relation to Jaynes' breakdown of the bicamel mind, but the name really reminds me of the breakdown, which is how I came to notice Ainslie's breakdown of will theory. I'm such a nihilist, I just can't stop myself from finding another thing to breakdown.

The author, George Ainslie, is most famed for his discovery of hyperbolic discounting in Specious reward: A behavioral theory of impulsiveness and impulsive control (1975).
People devalue a given future event at different rates, depending on how far away it is. This phenomenon means that our preferences are inherently unstable and entails our present selves being pitted against what we can expect our future selves to want.
"Discounting" means that humans prefer the same reward to come sooner. The \$100 that comes in a month seems to be less attractive than the \$80 that comes today. This is a very reasonable assumption, for two reasons:

  1. Eat, drink and be merry, for tomorrow we may die.
  2. \$80 today can be invested and turned into \$100 by the end of the month. Though to be realistic, the average return on capital is just 5%/year.

Exponential discounting

Economists used to assume humans discount the future with "exponential discounting", because it's time-consistent: you won't get into a fight with your future self! We will quickly derive this discounting, then break it down.

Basically, suppose you consider \$100 today and \$101 tomorrow about equally valuable, and you've made this choice enough times that you know this is what you'd think, every time. Then, you should confidently say that you also consider \$100 in 30 days and \$101 in 31 days equally valuable, after all, if you ask yourself this choice, she would say, yes, they are equally valuable.

This is "temporal consistency".

We also assume that utility can be added. That is, if you think
\$100 today = \$101 tomorrow
you should also think
\$200 today = \$202 tomorrow
and in general,
$u$ today = $1.01u$ tomorrow
This is "additive utility".

Now, it's easy to derive the exponential discounting formula:
$u$ in $n$ days = $1.01^{-n}u$ today

Hyperbolic discounting


Exponential discounting is wrong for many humans. For many humans, $100 today sounds better than $110 in one month, but $100 in 10 months is worse than $110 in 11 months. More tests show that humans discount "hyperbolically". Basically, they discount heavily in the near future, but doesn't discount much in the far future.

It's like they are impatient and instant-gratification today, but expect themselves to be patient and long-sighted in a year. Humans using hyperbolic discounting make choices that are inconsistent over time – they make choices today that their future self would prefer not to have made, despite knowing the same information.

The formula is instead
$u$ in $n$ days = $\frac{u}{1 + k n}$ today
where $k$ is a positive number.

Funny personal stories

I once went to a university psychology experiment, where they pretty much did the same kind of experiment on me: I did many 2-choice games on a computer, where at each choice, I had to choose between two gift cards of different values. One would be cheaper but given to me sooner, and the other the opposite. At the end of the experiment, I got one irl payoff from the result of one of the games, chosen randomly.

The twist of the experiment is that they put electrodes on my head and gave me transcranial direct-current stimulation, probably to see if certain kinds of TDCS would influence my time-discount rate. (Also during the preparation of installing the TDCS on my head, they found my self-harm scars and notified my therapist... fuck.)

The twist is I always chose the one that came later. I guess I was just a mistake for them. Just a mistake, an outlier.

This reminds me of another time I went to another experiment, where I had to play a series of dictator games with anonymous opponents. They are one-off, so there's no prospect of long term cooperation or punishment. At the end of the experiment, I got one irl payoff from the result of one of the games, chosen randomly.

The twist is that before each game, I heard a recording from the opponent, with all kinds of accents. The experiment was meant to see if familiar accent makes people more generous.

The twist is I always chose to take all, whatever accent they used. I guess I was just a mistake for them. Just a mistake, an outlier.

Intertemporal conflict and akrasia

People are inconsistent over time due to hyperbolic discounting. This means they fight with their selves at different times. This is the intertemporal conflict. It explains many things, such as why we do things we don't want to do [akrasia], why we feel the pull of inconsistent desires, the nature of will, etc.
[hyperbolic discounting] means that our preferences are inherently unstable and entails our present selves being pitted against what we can expect our future selves to want... it offers radical solutions to problems that have defeated utility theory: Why do people knowingly participate in addictions, compulsions, and bad habits? What is the nature of will? What makes a will weak or strong? Do we in fact need a concept of will at all?
Turns out, the human will can be broken into little pieces that have different discount rates, different desires, different powers, bargaining with each other. The bargaining is the will, and the deals from of the bargaining is the voice of the will.
... individuals are more like populations of bargaining agents than like the hierarchical command structures envisaged by cognitive psychologists.
This is similar to the "society of mind" from Marvin Minsky: the human mind is made of many small agents living together and dealing with each other. Each small agent has even smaller agents, and so on, all the way down to tiny agents that merely implement simple mathematical functions.

How hyperbolic discounting looks:
The temptation visualized as hyperbolic discount.

In this picture, the time axis means "time after day 0". The "value" axis means "value of a particular reward at a particular time". The blue curve indicates a sooner but smaller reward. As the agent approaches the time of the blue reward, the perceived value of the blue reward increases, until at the moment of getting the reward, where the value is the "true" value of the blue reward. Then it becomes 0 because it's in the past.

The blue curve thus traces a hyperbola curving upwards. Similar for the orange curve.

Notice the crossing of the two curves. Before the crossing, the orange curve is higher, meaning the orange reward seems better. After the crossing, the blue curve is higher. This is the temporal inconsistency!

This is a model of temptation. Blue is \$100 now and orange is \$200 in a month.

Herrnstein's matching law

Herrnstein's matching law is discovered in Relative and absolute strength of response as a function of frequency of reinforcement (1961), by Richard J. Herrnstein (famed for coauthoring a famous book on human IQ distribution, The Bell Curve).
When working with pigeons in a skinner box, Herrnstein realized that the number of times they would peck one button over another was directly correlated with the rate of rewards they had received for pecking each one of them. In other words, behavior rate matches reinforcement rate.
This is quite weird if you actually think about it. Suppose you have two buttons, pressing button 1 gives food 80% of the time, the other 40% of the time. Then after pressing them a bunch of times, you'd be pretty confident that button 1 is definitely better, so you should stop pressing button 2.

But instead, most humans (and pigeons, dogs, etc...) do frequency matching, rather than frequency maximizing. We would press button 1 $\frac 2 3$ of the time and button 2 $\frac 1 3$ of the time.

Picoeconomics

Pico- means $10^{-12}$. It's derived from Spanish pico, meaning "peak, beak, bit". 1 picolightyear is about 9 kilometers. It's really small.

Picoeconomics is George Ainslie's idea about what's happening as the desires in humans bargain with each other. We have macroeconomics, which is about how bargaining works on the whole economy of many agents; microeconomics, which is about how bargaining works in a single agent; now we get picoeconomics, which is doing microeconomics inside a single human's society of mind.

Consider the mind of a single pony. It is not a single agent, but made of many different ones.

Urge

Sometimes there are irresistible desire to do something despite not wanting it. For example, eating potato chips one by one and overeating as a result, and in general, when I start a meal I have an urge to finish whatever's in my dish. It's an urge I know well and hate.

When a reward precedes a bigger but further punishment, it is often preferred when up close but avoided at a distance. This is addiction. And an itch is a small addiction.

Or popping bubble wrap. Or playing idle game. Or eating junk foods. Or using social networks.
Minor itches will abate if never scratched, and the motive to scratch them gets described as an urge rather than a desire, as does the motive to bite your nails, use speech mannerisms, and emit tics. These are voluntary behaviors and may be subject to strong momentary motivation, but people avoid them at a distance and often seek preventive treatments.
Sometimes this difference can be dramatic. Mice with brain electric stimulation could be struck with an urge to eat, so they want to eat, but what they eat is completely bland and tasteless, and they don't like it. Wanting and liking are thus not the same. See What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? (1998), Kent C. Berridge, Terry E. Robinson.

And even pigeons can notice this difference and avoid it:
Pigeons were shown to actively avoid being offered the option of doing poorly rewarded work for food, instead of simply not doing the work when offered.
 

Emotions (warning: seems dubious to me)

Pain is not all repulsive:
Pain and painful emotions attract attention but deter approach. Pain can’t be the simple opposite of reward that is often assumed, because it could not then oblige people to attend to it... there are many indications that emotions and even the emotional part of pain are not automatic, but have to compete with rewarded activities for a person’s participation.
Indeed, sometimes pain can be completely ignored if it cannot get itself attention, as reported in a report where 37 patients underwent brain surgery under hypnosis without anesthesia.

That pain can be manipulate like that means it's less of a pure reflex like "touch fire, pull arm back", but more of a willful attention like "touch fire, electric signal goes to brain, the signal gets noticed by some higher processes, pain".
In sum, emotions... are at least partially in the realm of motivated behaviors, not conditioned responses; they are pulled by incentives rather than pushed by stimuli.  Even pain itself and “negative” emotions like fear and grief seem to be urges that lure you into participating in them, rather than being automatically imposed states.
If an itch is a fast addiction, maybe a pain is a fast itch.  That is, perhaps the vividness but aversiveness of pain and negative emotions is a pattern of repeating, brief, intense reward, the occurrence of which causes an otherwise continuous nonreward.
I don't think pain can be analyzed this way. After all, most pain cannot be predicted. There is no expectation. Pain is felt and evaluated after it has already happened. This is directly against what temptation and itch are like: the expectation determines how desirable the action is, and the most desirable is chosen.

Impulse control

Wills with long-term interests that won out would like to keep their interests working as time goes on. For example, if a will wins out in making a person to add "diet" to a list of goals, the will would also like to keep the goal actually reached, without the will for eating ruining it. This is impulse control, and done by "intertemporal bargaining". Basically, treating yourself at different times as different persons, and try to manipulate future yous to do your bidding.
There are 4 ways to do that
  1. finding constraints or influences outside of your psyche, like asking someone else to force you later, tying yourself to a mast, etc. 
  2. keeping your attention off temptations, like looking away, or in the Freudian defense mechanisms of suppression, repression, or denial;
  3. cultivating or inhibiting emotions, either consciously or in the defense mechanisms of isolation or reversal of affect.
  4. willpower, which seems to be at once the strongest and most versatile, but which has hitherto been mysterious
Animals can do impulse control. 
Rats will press a bar committing them to get .5 sec of shock 40 seconds later instead of 5 seconds of shock 45 seconds later, rather than leave the choice open and subsequently fail (almost always) to choose .5 seconds of imminent shock over 5 seconds of shock 5 seconds later.
Willpower is mysterious. It needs to be broken down and explained. Only then will we succeed in the Breakdown of will.
Words like volition, personal rules, character, intention, and resolve are often applied, but don't suggest how people have learned to resist temporary preferences for shortsighted options.
One property that willpower seems to have is "reference to a bigger rule/higher authority":
Aristotle said that akrasia is the result of choosing according to "particulars" instead of "universals" (Nichomachean Ethics); Kant said that the highest kind of decision-making involves making all choices as if they defined universal rules (categorical imperative)...
This might be why the superego is obsessed with rules. Why religious, military, and bureaucratic people tend to have more self control (at least when they are in their role), etc...

There are experimental demonstrations of this fact:
students who faced five weekly choices of a SS [Short-Sighted] amount of money immediately or a LL [Long-Lasting] amount one week later picked the LL amounts substantially more if they had to choose for all five weeks at once than if they chose individually each week.  The authors reported an even greater effect for SS vs. LL amounts of pizza.
Size: 2720x2240 | Tagged: artist:aemantaslim, cute, eating, female, food, glimmerbetes, laying down, mare, meat, mushroom, olive, peetzer, pepper, pepperoni, pineapple, pineapple pizza, pizza, plate, pony, pure unfiltered evil, safe, solo, starlight glimmer, that pony sure does love pineapple pizza, this will end in weight gain, unicorn
This kind of self-control can be explained by considering what happens if we sum all the future rewards. After summation, a series of LL suddenly can overpower a series of SS at any moment in time, not just when viewed from afar:
This illustration is kinda terrible though. You should do the summation in your brain instead.
Basically, the idea is this: if at the current moment, the next LL is overpowered by the next SS, then you can recruit the future LLs which overpower the future SSs. Add them all together, the net effect is that the LLs overpower the SSs.

Note that this summation doesn't do anything for exponential discounting. Exponential discounting is time-consistent, so the concept of "temptation" doesn't exist, and summing changes nothing.

Summation

That's nice, but who's summing? The "department of economy", the agent in the brain that looks at the "happiness expectations" calculated by various agents, and approves the one that is the highest.

Why would it sum like that? Ainslie does not say, but in my view, this willpower is an extra agent (imma call it "superego") who wants more than to be able to be able to predict what the "department of economy" would choose in the future. As such, the superego will force the "department of economy" to evaluate the "happiness expectations" given by the other wills using the summation method rather than the one-off method.

This superego is self-sustaining. The more it succeeds in imposing its predictions of the future, the more its predictions get turned into reality (in a sort of self-fulfilling prophecy), the more it's certain of its own ability to predict the future, and the louder it gets. This explains how using willpower makes more willpower later.

Ainslie calls this "intertemporal bargaining model of will".

The Ultimate Breakdown of Will: Nothing Fails Like Success

Willpower has surprising bad side effects.
Willpower... doesn't let us choose our best prospects from moment to moment as true exponential discounting would. Rather it formalizes internal conflict, making some self-control problems better, but some worse... we tend to see willpower as an unmixed blessing that bears no relation to such abnormal symptoms...

Rules overshadow goods-in-themselves

Related: keeping the letters rather than the spirit of the law, staying true to past, sunk cost, traditionalism, legalism, bureaucracy, bad faith.
You will have an impaired ability to live in the here-and-now, the loss of authenticity that existential philosophers complain of in modern society generally. 

Rules magnify lapses

Related: jumping to conclusions, overgeneralization, once a X always a X, no second prances.

If somehow the superego failed to make the person resist temptation, the superego may attempt to save its "reputation" by making up an exception and elevating it to a rule. For example, if it predicts that the person would not eat desserts, but fails to make the person stop eating a donut, it might make up a new rule: 
We would eat no desserts, except donuts.

This saves some reputation of superego, but also makes future encounters with donuts less resistible.

(Why do agents want "reputation"? My guess is that only agents require attention to survive. An agent that gets things wrong too many times would become ignored and die.)

That temporary lapses in willpower tends to create long-term, specific exceptions, is evidence for the intertemporal bargaining theory. Compare with the model of self-control failure based on exhaustion of "willpower strength", which doesn't predict this. According to that model, if you failed to resist the donut this time, it's just because you used up willpower strength earlier today. You wouldn't become more likely to eat the donut next time.

Rules motivate misperception

The other way superego can preserve its reputation is by causing confusion or deliberate ignorance. If somehow it makes the memory of eating a donut become forgotten, or make the donut-eating unconscious, that'd work too.
As a result, money disappears despite a strict budget, and people who "eat like a bird" mysteriously gain weight.
Unconscious eating is a serious cause of weight gain, btw. People eat a lot more when they aren't conscious of the eating.

Rules may serve compulsions

For people with particularly strong superego, decision may come to be worth more as a precedent than it is in its own right. This can be particularly damaging when the precedents are just wrong, and the superego just presses on anyway, because the RULES.

The mental image for this situation is some kind of magical superstition that's passed for generations, even when nobody believes in its powers anymore, and they still keep doing that because they don't want to be seen breaking the tradition, afraid it's a slippery slope to revolution.
In this way people who depend on willpower for impulse control are in danger of being coerced by logic that doesn't serve what they themselves regard as their best interests.

An Efficient Will Undermines Appetite

I can't quite understand this chapter. Be warned.

Ainslie says that "pleasure" comes for satisfying "appetite". An appetite is like pressure in an engine, and pleasure comes from releasing the pressure. The rate of pleasure is proportional to the speed of pressure release.

Some appetites are limited physically: you can't get food whenever you want. But emotional rewards are actually controllable. Actors can learn to have emotions without any external stimuli. So how come that emotions are hard to get by, and people spend so much effort in getting external stimuli for provoking inside themselves good emotions?

For some reason I didn't understand, Ainslie claims
To get the most out of any kind of reward, we have to have-- or develop-- limited access to it... with emotional rewards, the only way to stop your mind from rushing ahead is to avoid approaches that can be too well learned... To get the most out of emotional reward, you have to either gamble on uncertainty or find routes that are certain but that won't become too efficient.  In short, your occasions have to stay surprising...
Whatever the reasons for this, this does conform with our experiences, and have been confirmed by brain studies, such as Predictability Modulates Human Brain Response to Reward (2001), by Gregory S. Berns et al:
Using functional magnetic resonance imaging, the activity for rewarding stimuli in both the nucleus accumbens and medial orbitofrontal cortex was greatest when the stimuli were unpredictable.
Examples of this preference for surprise is everywhere. Supermarkets give unpredictable bargains, lottery and gambling in general give unpredictable payouts, novels and movies give unpredictable plots, etc. I'm kind of immune to most forms of surprise. I chew up novels and movies by reading their synopses only.

Preference for surprises lead to devaluing unsurprising emotions, and calling them "fake"/"inauthentic". This explains why people have the illusion that emotions are reflexive and uncontrollable, "passions".
Expressions that are known to be intentionally controllable are disregarded, as with the false smile of the hypocrite. By this process of selection positive emotion is left with its familiar guise as passion, something that has to come over you.
Also, when learning something new, practice makes perfect, but also chores.
The paradox is that it is just those achievements which are most solid, which work best, and which continue to work that excite and reward us least.  The price of skill is the loss of the experience of value-- and of the zest for living.
This once again shows the futility of wishing for long-lasting pleasure. As Epicurus noted, the best one can aim for is not pleasure, but freedom from pain.

The Need To Maintain Appetite Eclipses The Will

Premature satiation is the big problem with appetite. Eating feels better if you are hungry. Therefore you should starve yourself a bit before eating. But it's easier to satisfy hunger as soon as it becomes just enough to prompt you to eat. This is called premature satiation: the impulse to harvest emotional reward before it’s ripe.
Will not only cannot control this impulse, it may make you more vulnerable to it because of its demand for regular, distinct criteria for choice.
The greatest limitation of the will comes from the same process as its greatest strength: its relentless systemization of experience through attention to precedent, which braces it against temporary preferences but also makes it unable to follow subtle strategies to overcome the premature satiation of emotional appetite.
The superego wants to make all meals exactly the same, because that'd be most predictable and rule-based. That's not going to make you feel very happy.

The construction of fact puzzle

Contemplate these two facts:
  1. What you believe is highly influenced by what you choose, your preferences, your desires. For example, the same information can lead to opposite beliefs (hostile media effect, confirmation bias), etc.
  2. You can't just decide to believe something. Say, I would give you $100 if you believe that I have a cat, and I have a brain scanner to see if you really believe that. You can't just do that even if enticed by money. You'd have to go to my home and look, to properly believe I have a cat. Why? Similar reasons are why purely practical arguments for the existence of God are so ineffective.
Ainslie considers two kinds of beliefs.

Instrumental beliefs are like "if I press this button, this light turns on". It's evolved to be constrained by external stimuli, so that they would be useful beliefs, rather than sheer fabrications.

Noninstrumental beliefs are about emotions, such as "I love Twilight Sparkle". Ainslie thinks they are purely made for making satisfying experience of emotions, by making emotions happen only under specific instances. I don't understand it at all though.

The vicarious reward puzzle

Other people are especially valuable as sources of emotional experience... at the extreme, for sadist and victim.  How do other people move us, and what are the constraints on that process?
Ainslie thinks it's just another way to get some nice emotions by unpredictability. Gambling for emotions, basically. It also explains why an emotionally satisfying friendship tends to be one that's equal in power:
But gambles that are rigged--inter­actions that are predictable, people you can boss around, rela­tionships you're poised to leave if they turn disappointing-- push your emotional experiences in the direction of daydreams... such an impulse is punished by a loss of suspense, and hence of all but fairly short range reward. 
Ainslie also hypothesizes that empathy is a strategy for getting even more emotional feelings from a social interaction.
To model the other people is to have their expected feelings; and nothing makes these "vicarious" feelings differ in kind from "real" ones.

The indirection puzzle

Some goal-directed activities can't effectively approach their goals by direct routes.  Trying to have fun usually spoils the fun, and trying to laugh inhibits laughter.
This is easily explained by, again, that people don't feel strong emotions from predictable stimuli. The will makes things predictable, and thus willing emotions doesn't work well.
It's a great way to achieve a goal as efficiently as possible, so that you can go on and do something else. It's a terrible way to enjoy an activity for its own sake, because it kills appetite. You inevitably learn to anticipate every step of the activity, so that it eventually becomes "second nature," making it so uninteresting that people used to think that ingrained habits were run by the spinal cord. 
So a too-powerful will tends to undermine its own motivational basis, creating a growing incentive to find evasions.
So you go looking for troubles.
In general you will need to believe in some larger quest that requires you to put your satisfaction at risk.  To climb mountains or jump out of airplanes as a test of fortitude, to stay with an abusive lover to prove your loyalty, to join a religion that demands self-abasement, to play the stock market or the horses as a way to get rich, even to bet your dignity on staying in the forefront of fashion, leads to repeated losses or at least the credible threat of losses.
Pointing this out upsets people because it breaks the illusion, and thus robs people of the emotional reward they seek.
For instance, romance undertaken for sex or even "to be loved" is thought of as crass, as are some of the most lucrative professions if undertaken for money, or performance art if done for effect. Too great an awareness of the motivational contingencies for sex, affection, money, or applause spoils the effort.
Or, as Rainbow Dash puts it elegantly:
The Need To Maintain Appetite Eclipses The Will

In conclusion

nothing lasts

Peace, and hope you die peacefully. I know I won't.

Heart heart heart.

No comments:

Post a Comment

Let's Read: Neuropath (Bakker, 2009)

Neuropath  (Bakker 2009) is a dramatic demonstration of the eliminative materialism worldview of the author R. Scott Bakker. It's very b...