Utilitarianism for engineers
I've said before that it's impossible to compare interpersonal utilities in theory but pretty easy in practice. Every time you give up your seat on the subway to an old woman with a cane, you're doing a quick little interpersonal utility calculation, and as far as I can tell you're getting it right.
The lack of the theory still grates, though, and I appreciate it whenever people come up with something halfway between theory and practice; some hack that lets people measure utilities rigorously enough to calculate surprising results, but not so rigorously that you run up against the limits of the math. The best example of this is the health care concept of QALYs, Quality Adjusted Life Years.
The Life Year part is pretty simple. If you only have $20,000 to spend on health care, and you can buy malaria drugs for $1,000 or cancer drugs for $10,000, what do you do? Suppose on average one out of every ten doses of malaria drugs save the life of a child who goes on to live another sixty years. And suppose on average every dose of cancer drug saves the life of one adult who goes on to live another twenty years.
In that case, each dose of malaria drug saves on average six life years, and each dose of cancer drug saves on average twenty life years. Given the cost of both drugs, your $20000 invested in malaria could save 120 life years, and your $20000 invested in cancer could save 40 life years. So spend the money on malaria (all numbers are made up, but spending health resources on malaria is usually a good decision).
The Quality Adjusted part is a little tougher. Suppose that the malaria drug also made everyone who used it break out in hideous blue boils, but the cancer drug made them perfectly healthy in every way. We would want to penalize the malaria drug for this. How much do we penalize it? Some amount based on how much people disvalue hideous blue boils versus being perfectly healthy versus dying of malaria. A classic question is "If you were covered in hideous blue boils, and there were a drug that had an X% chance of making you perfectly healthy but a (100 - x%) chance of killing you, would you take it?" And if people on average say yes when X = 50, then we may value a life-year spend with hideous blue boils at only 50% that of a life year spent perfectly healthy.
So now instead of being 120 LY from malaria versus 40 LY from cancer, it's 60 LY from malaria versus 40 from cancer; we should still spend the money on the malaria drug, but it's not quite as big a win any more.
[I have gone back and edited parts this post three times, and each time I read that last sentence, I think of a spaceship a hundred twenty light years away from the nearest malaria parasite.]
Some public policy experts actually use utilitarian calculations over QALYs to make policy. I read an excellent analysis once by some surgeons arguing which of two treatment regimens for colon cancer was better. One treatment regimen included much stronger medicine that had much worse side effects. The surgeon supporting it laboriously went through the studies showing increased survival rates, subtracted out QALYs for years spent without a functional colon, found the percent occurrence of each side effect and subtracted out QALYs based on its severity, and found that on average the stronger medicine gained patients more utility than the weaker medicine - let's say 0.5 extra QALYs.
Then he compared the cost of the medicine to the cost of other interventions that on average produced 0.5 extra QALYs. He found that his medicine was more cost-effective than many other health care interventions that returned the same benefits, and therefore recommended it both to patients and insurance bureaucrats.
As far as I can tell, prescribing that one colon cancer medicine is now on sounder epistemological footing than any other decision any human being has ever made.
Towards A More General Hand-Wavy Pseudotheory
So if we can create a serviceable hack that lets us sort of calculate utility in medicine, why can't we do it for everything else?
I'm not saying QALYs are great. In fact, when other people tried the colon cancer calculation they got different results by about an order of magnitude.
But a lot of our social problems seem to be things where the two sides differ by at least an order of magnitude - I don't think even the most conservative mathematician could figure out a plausible way to make the utilitarian costs of gay marriage appear to exceed the benefits. Even a biased calculation would improve political debate: people would be forced to say which term in the equation was wrong, instead of talking about how the senator proposing it had an affair or something. And it could in theory provide the same kind of imperfect-but-useful-for-coordination focal point as a prediction market.
Okay, sorry. I'm done trying to claim this is a useful endeavor. I just think it would be really fun to try. If I need to use the excuse that I'm doing it for a constructed culture in a fictional setting I'm designing, I can pull that one out too (it is in fact true). So how would one create a general measure as useful as the QALY?
Start with a bag of five items, all intended to be good in some way very different from that in which the others are good:
1. $10,000 right now.
2. +5 IQ points
3. Sex with Scarlet Johannson
4. Saving the Amazon rainforest
5. Landing a man on Mars
A good hand-wavy pseudotheory of utility would have to be able to value all five of these goods in a common currency, and by extension relative to one another. We imagine asking several hundred people a certain question, and averaging their results. In some cases the results would be wildly divergent (for example, values of 3 would differ based on sex and sexual orientation) but they might still work as a guide, in the same way that believing each person to have one breast and one testicle would still allow correct calculation of the total number of breasts and testicles in society.
Let's start with the most impossible problem first: what question would we be asking people and then averaging the results of?
The VNM axioms come with a built in procedure for part of this - a tradeoff of probabilities. Would you rather save the Amazon rainforest, or have humankind pull off a successful Mars mission? If you prefer saving the rainforest, your next question is: would you rather have a 50% chance of saving the rainforest, or a 100% chance of a successful Mars mission? If you're indifferent between the second two, we can say that saving the Amazon is worth twice as many utils as a Mars mission for you. If you'd also be indifferent between a 50% chance at a Mars mission and a 100% chance of $10,000, then we can say that - at least within those three things - the money is worth 1 util, the Mars landing is worth 2 utils, and the rainforest is worth 4 utils.
The biggest problem here is that - as has been remarked ad nauseum - this is only ordinal rather than cardinal and so makes interpersonal utility comparisons impossible. It may be that I have stronger desires than you on everything, and this method wouldn't address that. What can we turn into a utility currency that can be compared across different people?
The economy uses money here, and it seems to be doing pretty well for itself. But the whole point of this exercise is to see if we can do better, and money leaves much to be desired. Most important, it weights people's utility in proportion to how much money they have. A poor person who really desperately wants a certain item will be outbid by a rich person who merely has a slight preference for it. This produces various inefficiences (if you can call, for example, a global famine killing millions an "inefficiency") and is exactly the sort of thing we want a hand-wavy pseudotheory of utility to be able to outdo.
We could give everyone 100 Utility Points, no more, no less, and allow these to be used as currency in exactly the same way the modern economy uses money as currency. But is utility a zero sum game within agents? Suppose I want a plasma TV. Then I get cancer. Now I really really want medical treatment. Is there some finite amount of wanting that my desire for cancer treatment takes up, such that I want a plasma TV less than I did before? I'm not sure.
Just as you can assign logarithmic scoring rules to beliefs to force people to make them correspond to probabilities, maybe you can assign them to wants as well? So we could ask people to assign 100% among the five goods in our basket, with the percent equalling the probability that each event will happen, and use some scoring rule to prevent people from assigning all probability to the event they want the most? Mathematicians, back me up on this?
The problem here is that there's no intuitive feel for it. We'd just be assigning numbers. Just as probability calibration is bad, I bet utility calibration is also bad. Also, comparing things specific to me (like me getting $10,000) plus things general to the world (saving the rainforest) is hard.
What about just copying the QALY metric completely? How many years (days?) of life would you give up for a free $10,000? How about to save the rainforest? This one has the advantage of being easy-to-understand and being a real choice that someone could ponder on. And since most people have similar expected lifespans, it's more directly comparable than money.
But this too has its problems. I visualize the last few years of my life being spent in a nursing home - I would give those up pretty easily. The next few decades are iffy. And it would take a lot to make me take forty years off my life, since that would bring my death very close to the present. On the other hand, some things I want more than this scale could represent; if I would gladly give my own life to solve poverty in Africa, how many QALYs is that? The infinity I would be willing to give, or the fifty or so I've actually got. If we limit me to fifty, that suggests I place the same value on solving poverty in Africa as on solving poverty all over the world, which is just dumb.
Someone in the Boston Less Wrong meetup group yesterday suggested pain. How many seconds of some specific torture would you be willing to undergo in order to gain each good? This has the advantage of being testable: we can for example offer a randomly selected sample of people the opportunity to actually undergo torture in order to get $10,000 or whatever in order to calibrate their assessments ("Excuse me, Ms. Johansson, would you like to help us determine people's utility functions?")
But pain probably scales nonlinearly, different tortures are probably more or less painful to different people, and as I mentioned the last time this was brought up society would get taken over by a few people with Congenital Insensitivity To Pain Disorder.
Maybe the best option would be simple VNM comparisons with a few fixed interpersonal comparison points that we expect to be broadly the same among people. A QALY would be one. A certain amount of pain might be another. If we were really clever, we could come up with a curve representing the utility of money at different wealth levels, and use the utility of money transformed via that curve as a third.
Then we just scale everyone's curve so that the comparison points are as close to other people's comparison points as possible, stick it on the interval between one and zero, and call that a cardinal utility function.
Among the horrible problems that would immediately ruin everything are:
- massive irresolvable individual differences (like the sexual orientation thing, or value of money at different wealth levels)
- people exaggerating in order to inflate the value of their preferred policies, difficulty specifying the situation (what exactly needs to occur for the Amazon to be considered "saved"?)
- separating base-level preferences from higher-level preferences (do you have a base level preference against racism, or is your base level preference for people living satisfactory lives and you think racism makes people's lives worse; if the latter we risk double-counting against racism)
- people who just have stupid preferences not based on smart higher-level preferences (THE ONLY THING I CARE ABOUT IS GAY PEOPLE NOT MARRYING!!!)
- scaling the ends of the function (if I have a perfectly normal function but then put "making me supreme ruler of Earth" as 10000000000000000000x more important than everything else, how do we prevent that from making it into the results without denying that some people may really have things they value very very highly?)
- a sneaking suspicion that the scaling process might not be as mathematically easy as I, knowing nothing about mathematics, assume it ought to be.
I'd be very interested if anyone has better ideas along these lines, or stabs at solutions to any of the above problems. I'm not going to commit to actually designing a system like this, but it's been on my list of things to do if I ever get a full month semi-free, and if I can finish Dungeons and Discourse in time I might find myself in that position.