This post on Best Of A Great Lot is a part of a series on the subject of designing a new form of governance. Each piece aims to stand alone, but fits together on the Table of Contents.
Prediction markets are a promising technology that are just starting to mature and seem obviously to be a tool that any self-respecting governance system designer of the 21st century (all 3 of us?) should be excited about. For an overview of how they work, I refer you to Scott Alexander's FAQ on the subject. Kelsey Piper offers this nice short explanation:
In a prediction market, you buy bets on whether an outcome will occur. For example, you could buy a “ticket” that pays you $1 if special counsel Robert Mueller testifies publicly to a congressional committee. How much would you pay for such a ticket? 5 cents? 30 cents? Your answer will depend on how likely you think it is that the report will be released.
For all their interestingness and excitement, my day-to-day experience with them is minimal. I find most attempts to use them far from compelling, and though I interact with them far more than most people I know, it's still only a few times a month at most.
Clay Graubard and Andrew Eaddy of the substack titled, simply, Predictions, agree.
Numerous companies have arisen trying to propagate the technology and none have achieved great success—there has been no forecasting nor prediction market unicorn.
Robin Hanson, the originator of the idea, gave a speech in which he seems to agree that we should still be searching for better implementations.
In the past, Hanson has proposed several markets that he thinks would be useful, of which two seem particularly worth discussing: a market on who will be successful in life for use in prioritizing admissions to elite colleges and a market on whether companies would do better if they fired their CEO. Despite being similar in many ways, the second seems obviously better. Why?
If we want to use prediction markets as a tool within a system of governance, we need to understand what makes a prediction market more or less compelling. In other words, if we're searching for better implementations, what's likely to make a better implementation?
Let's add to our consideration two markets that exist and are successful: election prediction markets like PredictIt and the granddaddy of prediction markets, the stock market.1 And to round out the other side, we have Metaculus and Manifold Markets, two markets which have only taken off in the sense of being well accepted by prediction market afficionados. The Segway of prediction markets, perhaps.
The most obvious way in which Metaculus and Manifold Markets are different from the other four examples is that they are general purpose. Anyone can ask any question in any form. In software design, there are cases where "general purpose" is a valuable asset (for example, Excel) and numerous much less visible cases where being general purpose means that it doesn't solve anyone's problems. A rule from user experience design: constraints that match users' expectations allow users to achieve their goals. Too much flexibility often just frustrates people each time it allows them to shoot themselves in the foot.
In the case of Metaculus and MM, the lack of constraint on what questions can be asked, how they can be phrased, and how they are judged collaborate together to send every user on a new voyage of discovery and uncertainty with every question they face.
Consider a question like when will the war in Ukraine stop? To make this a workable prediction market, the author had to add a bunch of caveats to the description of the market to cover some of the many directions people could take the interpretation.
This market resolves after it is generally agreed upon, that the Ukraine war has reached a permanent ceasefire or if there has been no ongoing military action for at least 6 month even if it is not clear that the war is over.
There does not need to be a peace agreement.
Most questions don't have this level of clarification. But knowing that there exist sufficiently motivated pedants, we could imagine more clarifications that might be good to have. For example, what exactly qualifies as "ongoing military action"? The US has bombed Afghanistan a few times since we withdrew, but officially we are no longer at war. Does that count?
How about something more extreme, like if Russia wins and conquers Ukraine? There might be a guerilla insurgency that has no formal standing or centralized location but continues shooting Russian soldiers. What if, for reasons that are hard to imagine right now, the EU and US decide that they are going to acknowledge Russia as the only country there, does that still count as two countries at war?
I like to think of each of these loose ends as an obstacle that must be surmounted by a user who’s on the fence about whether to participate in prediction markets at all; i.e. the general public. Some users will notice the potential flaws in the question and doubt whether they should participate, or spend more time on one question than they want to and decide not to look at other questions in the future. Others will be turned off when they have bet on a question and the outcome surprises them in ways they weren't considering, and won't come back. Those who overcome all of this might become hobbyists on the market, but they may have enough doubt that they stay hobbyists, rather than becoming daily users. Sure, some users will ignore the details because they trust the person who created the market, but so far, that hasn't been enough. For mainstream acceptance, the cognitive load to participate freely has to be low.2
Contrast with the markets that aren't general purpose. There, the market repeats a single question (with a consistent form of an answer) across many different but equivalent situations. Thousands of stocks might go up or down tomorrow or next month. Hundreds of elections will be won or lost on election day. And this is true of both of the markets that Hanson proposes: thousands of companies could have higher stock or lower depending on whether they fire their CEO, and thousands of students go to college every year.
The benefit of having a consistent question within a market is that effort you spend on understanding what matters in answering the question can be re-applied to the next entry in the market. This is a lot harder when you have two skills that have to be built: interpreting the question and also answering it.
But even with a clear and consistent question, the college admissions market continues to feel like it’s less compelling than the others. A part of this is that the definition of success and failure at life aren't nailed down yet. We should expect that someone making a real go at that market would have to nail them down to make the question interpretable enough for users to be comfortable. But that’s not the only problem.
The second key to the successful markets (elections, stock) is a short and understandable time horizon. Elections happen every couple years on a specific date. Many people buy and hold in the stock market, but for the people who are treating it like a prediction market (active investors), the time horizon is generally single-digit years or less. We should also expect the Fire the CEO market to have a short time horizon.
Manifold Markets and Metaculus have a different time horizon for every question, and often they have to nail down the time horizon to make the question answerable at all. Looking back at the question about when the war in Ukraine will end, the asker decided that if the war is ongoing in 2028, the answer will resolve to infinity.
The appetite in our society for bets that go past a few years is fairly limited. longbets.org exists, of course, but most of the people betting on it only have a small number of bets, and mostly are betting on things close to their area of deep expertise. Longbets doesn't have a hundred thousand bets or anything close to it.
The time horizon of the college admissions market is what makes it so much less compelling. It simply takes too long for students to grow up to be successful adults. Some people become obviously successful fairly quickly. Others take decades. And how do you resolve a market for someone who makes and then loses a fortune? How and when do you resolve it for someone like Elizabeth Holmes, feted and celebrated and then discovered to be a fraud?
The last critical difference between the successful markets and the general purpose ones is important and difficult to resolve: the question of who and how the official answer is determined.
With the stock market, the question of who judges is obvious: buyers and sellers of the stock tomorrow decide what price they'll accept. PredictIt relies upon a vast ecosystem of journalists and officials working together to declare the winners of elections. If a Fire the CEO market existed, I assume it would be judged automatically by the stock price at some point after the firing happened. 3
Metaculus and Manifold Markets have no such obvious judges. Instead, the asker of each question judges the outcome. Unfortunately, not all people who ask questions on random websites are fair and neutral arbiters of the outcome of the question they ask. I don't mean to be glib, this is a real problem, and it only gets worse when real amounts of money become involved. People who ask questions on random websites may also be affiliated with the people betting and earning money related to the questions they ask, and this introduces important bias that's hard to tell from usernames. In the event that these markets became serious and they continued to have the question's asker judge, some people would be willing to influence the person who asked the question, if they weren't able to just be the person asking the question themselves. In general, the problem of finding neutral arbiters is an unsolved one.
This is one of those cases where scale matters. PredictIt's financial limits keep anyone from having a strong enough incentive to corrupt our electoral process solely to earn money, though there have been some attempts to drive news stories with large bets. If the amounts being bet were in the billions, you can bet people would try harder. Metaculus and Manifold Markets are tiny, and are effectively toys from a scale perspective. If, somehow, general purpose prediction markets were to become popular enough to invite millions of dollars into important question, the question of who judges and how they judge would become extremely important. There's vastly less subjectivity around an election outcome than there is around the determination of whether Russia and the Ukraine are at war, and that's one of the more objective questions I've run into.
Prediction market advocates suggest that market manipulation isn't a problem because manipulation just invites active traders in to earn money correcting the manipulation. But that argument only focuses on a specific kind of market manipulation: betting large amounts of money. Grammar bugs, bribing judges, paying for news articles saying what you want the outcome to be, even paying to make the outcome you want happen are all possible and problematic. All of these are serious problems for open ended question prediction markets whose judges are whoever asks the question.
These three tests are a good start for any market, but they’re not always enough. Let's naively imagine a prediction market that we could use to replace elections that passes the tests and consider its remaining problems. An obvious proposal for replacing elections could be a prediction market of what percentage of the population would be satisfied with a candidate after 6 months in office. The candidate with the highest predicted satisfaction rating would be chosen. Bettors would receive payouts based on how close to the actual satisfaction their bets were.
So, how does this do by our criteria above?
A single question, repeated across events? Check. Time-boundedness? Check. Judgement's always the hard part, but this judgement seems straightforward: we're not relying on any one person to make a judgement, but rather on a standard process that we already do a lot of. We do polling on citizen satisfaction all the time. We know about taking the average of multiple polls to try to avoid polling bias. We know about sample size, and we could require that any poll going into the metric be run by an independent nonprofit with a long history, have a certain sample size and acceptable methodologies, etc. From the perspective of what makes a prediction market more likely to be successful, this seems solid.
If we used this method for school board, it might well not have significant problems. But if we used it for Congress and the Presidency, we should expect the scale of what we're doing to generate a lot of people looking at how to break this. So, how do you break this?
The market manipulation method most people expect is to dump a large bet on the candidate you want to win, perhaps timing it at the right moment to either generate a story about their momentum or have it happen right as the market is closing, and hope that you've put enough money down to get the outcome you want. You do this if you care more about your candidate being in office than about getting your money back, and you're more likely to do this to win a close election where there’s ambiguity on whether you’re doing manipulation or just betting. If you don't have market funding limits, it's easier to try to manipulate the market this way, but it's also easier for hedge funds and other wealthy investors to pile in to take your money in response to any blatant market manipulation. So there are some natural limits on this kind of manipulation.
The judgement manipulation is more interesting. If you are an unscrupulous bettor, your fundamental goal is either to manipulate the polling methods or public perception to push citizen satisfaction towards the number you've bet: either improve it for someone you've bet in favor of or worsen it for someone you've bet against. The basic techniques are likely to be bolstering or destroying the elected official's image, bribing or hacking pollsters, bribing or threatening poll respondents, and similar behaviors. One likely result is a new category of post-election ad campaigns, especially in the time leading up to judgement time.
You can argue that this judgement manipulation is likely to be contested, so it doesn’t obviously favor one side or another. It does seem to favor the rich, who have the money to try more things, but not obviously much more than our current system, and wealth doesn’t automatically bring polling success in our current system. But I can’t argue that it’s much of an improvement on our current system: the satisfaction numbers aren’t likely to be more closely connected to reality than our current voting is.
One of the core challenges here is that citizens are not particularly well-informed, and have little incentive to become well-informed, so people's satisfaction is less a reflection of representative behavior and more a reflection of their own daily life and stories that they’ve heard. An optimist might argue that this setup would encourage bettors to better inform the citizenry (or improve their daily life), but I think a more likely outcome is something akin to current political campaigns: as much storm and fury and lying as genuine education.
So while this is an interesting idea for a prediction market, I doubt it would make a better system than our current one. Instead, it serves as a useful tool to think about the challenges of designing prediction markets into a system of governance. I'll return to this later, as I lay out my proposed system.
Matt Levine wrote an interesting article about how prediction markets work, saying that they incentivize people to bring forth weird events because they can bet on them. I haven’t fully incorporated that thinking yet, but it’s worth considering for anyone designing a prediction market.
If you find my work interesting, please share and subscribe. It helps tremendously.
Is the stock market really a prediction market? There are two ways to view the stock market that align with the core idea of a prediction market. The first is that buying a stock is predicting the future value of the company. The question you're predicting in this instance is whether the company will do well. People frequently describe the stock market this way. The second view is that buying a stock is predicting that stock purchasers will in the future like this company. Sometimes more cynical people describe the stock market this way, and use this explanation when faced with stocks like GameStop and Tesla that seem to have very different methods of valuation from other companies. This definition still meets the same key aspect of a prediction market: predicting the outcome of a question about the future. In this case the question you are predicting is whether future stock purchasers will view this stock favorably.
Or the market has to be much more potentially profitable - it's true that if a ton of money is in the market, institutional players will overcome these obstacles no matter what, just as they do with more complex financial instruments such as derivatives. But if you want broad adoption, the way PredictIt and the stock market have achieved, you need cognitive ease of use.
The Fire the CEO market passes all three of my tests. Metaculus and Manifold do not. But, interestingly, Metaculus and Manifold might well be really effective if people started treating them as infrastructure to build markets like the Fire the CEO market. Both Manifold and Metaculus offer APIs that allow creating questions. With their permission, it should be fairly straightforward to build a Fire the CEO market on top of one of them as a platform, by automatically creating a market for each current CEO and then having your own UI showing those markets. Judgement could also presumably be automated based on submitted news sources from acceptable sources and an LLM to evaluate whether the url says the CEO resigned or was fired. Robin Hanson proposed that funding such markets to real effectiveness might require on the order of millions of dollars, but anyone building this might be able to cheat by taking a page out of Kickstarter’s book: invite people to indicate a willingness to bet on any particular CEO, and then create the market only when a sufficient mass exists.
(I copied this comment from LessWrong, because it will likely be rejected for using the adjective "dogshit" and I refuse to compromise my communication style for whatever fucking hall monitor dipshit has the job of rejecting that)
While I agree completely with you about what you call the "caveats", "loose ends", and "cognitive load" imposed by them, it's ENORMOUSLY frustrating to read this in a public forum. I was already well aware of these problems years ago when I was unexpectedly scouted and hired by Metaculus and given a brief to fix them. But they turned out to be completely, wildly incompetent as an organization, like just dogshit incompetent. They had given contradictory instructions to other people and ultimately just didn't understand what they were even telling people to do, and they were extremely quick to blame their employees for problems without doing any actual study, just going with whoever complained about the other person first or whoever had the better resume. So they ended up firing me within a few months and continuing on to approve these garbage questions.
There is no reason why crowdsourced forecasting oracles actually need to be full of under- and over-specified "gotcha" questions. It's just that there has literally never once been one that was intensively curated by people who weren't idiots. If someone wanted to do that, they could. It is a problem of competence and of these platforms often being created by ideologues who aren't really forecasters at all.