Objectivity and its Discontents
This post on Best Of A Great Lot is a part of a series on the subject of designing a new form of governance. Each piece aims to stand alone, but fits together on the Table of Contents.
1.
In A Knotwork of Bureaucracies, we went over some of the details that drive bureaucracies toward and away from producing good outcomes (but more away than toward). One of the most powerful tools which can do both is data-driven decisionmaking and its ilk.
New phrases and framings of this concept seem to sweep through the business world every decade or so. Data driven decisionmaking is the latest name for it, but there's also Management By Objective (MBOs) and Key Performance Indicators (KPIs), and SMART goals (the M is for measureable) and other similar frameworks.
The central thesis behind data-driven decisionmaking in all of its forms is that we want to be objective — driven by numeric metrics — rather than subjective. Subjective judgements allow for all manner of corruption: relationships, feelings, whims. If something can be made more objective, the theory goes, we can root out these human biases.
When companies set up incentive structures, they often reach for this theory. HR professionals demand that bonus payments be tied to measurable, objective goals to prevent them from being given willy-nilly to the charismatic and the bold.
When leaders oppose this view, they do so by making two key arguments. First, many of the most important aspects of culture and success are incredibly difficult to measure. Second, the easy things to measure often have tremendously bad side effects when the organization incentivizes them.
The book How To Measure Anything attempts to bridge these two arguments by arguing that we can and must measure the difficult-to-measure cultural aspects of things, and that it's possible if you're sufficiently clever.
2.
The ancient and mystical scientific technique of specifying things in numbers is an amazingly powerful tool. There are lots of cases where objectivity and data-driven decisionmaking are obviously better than what they seek to replace.
What gets measured, gets done. —Peter Drucker
Imagine the CEO who discovers that the company won't make payroll shortly after having been told by his sales and finance leaders that things are going great. This is a business that desperately needs a few more metrics, and maybe someone to pay attention to some basic revenue projections. Imagine a public health department saying “we’re sure there’s a virus going around” and with zero data on how many cases. We could come up with hundreds more cases of this sort of thing and point to the vast overconfidence of many incompetent people and how easy it is for them to fool themselves and others in a world of subjective self-evaluations.
I could spend a book defending the use and value of objectivity and data-driven decisionmaking. Fortunately, How To Measure Anything has already done that. But with any tool of great power comes great responsibility, and the responsibility here is to understand some of the pitfalls.
3.
Measurability and objectivity bring with them several key problems in the context of decisionmaking. A sophisticated leader works to minimize these, and a well-designed governance system should also.
Subjectivity can creep in anywhere.
Objectivity creates an illusion of being all-encompassing.
Goodhart's Law.
Some things are just easier to measure.
When creating an objective measure, we frequently lose the most important aspects of the goal.
Relationships warp objectivity.
1. Subjectivity creeps in everywhere.
The goal this quarter is 400 fully qualified leads, says the sales leader. We'll deliver these 10 key features, says the engineering leader. Our factories will produce 300 units, says the operations leader. What makes a lead fully qualified? What counts as delivered? What is the scope of a feature? Is a unit produced this quarter if half of it was made last quarter? How about if it's done but not packaged?
These are the easy examples, things where the subjectivity is straightforward and can be largely ignored. We said before that if you're sufficiently clever, you can measure most anything. The flip side of being able to measure anything is carefully choosing your measure to support what you already believe. Many measurements include some degree of human judgement, and once you apply judgement from sufficiently (mis)motivated humans, it's possible to get nearly any outcome for a single metric. The traditional methods include:
Fiddling with the way you're asking the question
What answers you allow
Curating the population of who to ask the question of
And that's before you get to applying statistics. The last two decades of replication crisis have shown that it's entirely possible for scientists — who at least live within a culture that proclaims its goal is the idealistic (or at least bureaucratic) pursuit of truth — to go seriously astray. Unsurprisingly, people whose mission is less directly aligned with seeking truth do poorly. Being able to measure anything is no guarantee you've measured anything true or useful.
The flip side of being able to measure anything is carefully choosing your measure to support what you already believe.
2. Objectivity creates an illusion of being all-encompassing.
An unfortunate side effect of having numbers is that they carry with them an air of objectivity. Sometimes that air isn't deserved.
Salespeople often make heavy use of this fact with the concept of a case study. A case study is where the marketing team works with current customers to create a 1-2 page story of how their product has been a tremendous success for a specific customer. One of the rules of marketing is that case studies are much more convincing if they include a metric that sounds objective in them. So marketing departments game this. They pick customers carefully, pick metrics that are most likely to demonstrate numeric improvement, and exaggerate anything they feel they can get away with. Customers are often offered incentives like money or access in exchange for helping with the case study. The case study then trumpets the 27% improvement that they were able to coax out.
Imagine being an employee within a company that is considering buying some product. Perhaps you’re an accountant and they’re looking at a piece of software that will automatically categorize expenses, or you’re a marketing person and they’re looking at software that schedules LinkedIn posts, or you’re in the factory and they’re considering buying a piece of machinery to do one of the tasks someone was doing manually.
If you’re a detractor — you don’t think the product will make things better or faster — you’re now up against a shiny case study which says that at your competitor, this product made a 27% improvement. To argue against this, many people will expect you to show how this product will have a 27% worsening on something else, or come up with your own study where it didn’t have an improvement. It’s unlikely you’ve done a study and have numbers to back up your claim that you shouldn’t buy the product, so if you’re arguing with your boss and they’re on the fence, that 27% is going to weigh more heavily in their head than your own more subjective arguments.1 Is the 27% really more true because it’s numeric?
Of course, you've seen this in consumer marketing too. It often seems that 9/10 dentists approve of most dental products on the market. 9/10 sounds more believable than 10/10, and obviously better than 8 or the dreaded 7. Can you imagine a product claiming 7/10 dentists approved of it? The intern at the marketing company who runs a study and finds 7 out of 10 dentists approves of the product is either asked to go find better dentists or fired. They were supposed to have randomly selected dentists from an area that’s seen lots of purchases of their product!
3. Goodhart's Law.
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
Or Marilyn Strathern's version:
When a measure becomes a target, it ceases to be a good measure.
In determining whether they're achieving their own goals, a marketer might measure reach (number of people who've seen their campaign). A salesperson might measure number of contacts. A software engineer might measure lines of code or number of functions written. A factory owner might measure number of items delivered. A farmer might want to know percentage of tomatoes that made it to the store un-squished.
But when you make these measures into targets, you get bad results. Marketers measured on reach may make something people really hate. Salespeople measured on contacts don't spend enough time with a prospect to hear what they need, let alone actually sell anything. Software engineers measured on lines of code make simple things more complicated than they should be.
Factories measured on number of items delivered make lower quality stuff — the story about Soviet nail factories had them delivering nails without points when measured on quantity, and when the nails were measured on weight, one nail that required a crane to lift. Tomatoes measured on transportability taste terrible. I could go on ad nauseum, which seems to be where those tomatoes are headed.
Some things are just inherently easier to measure than others. It’s vastly easier to measure how many conversations a salesperson had or how many tomatoes made it to market than if they were any good. Goodhart’s Law exists at least in part because quality is often harder to measure than quantity.
4. When creating an objective measure, we frequently lose the most important aspects of the goal.
Because some things are easier to measure than others, there's a particularly pernicious variant of Goodhart's law that I call goal degradation. The process generally goes like this:
Think of a goal.
Realize that the goal isn't directly or perfectly measureable.
Find a measure that reflects part of the goal
Describe a goal that mostly fits that measure
Start over at step 1 with this new goal.
Quality goals get watered down into quantity goals that are easier to measure.2 Outcome goals get watered down into daily activity counts. When individuals do this, they can often maintain a memory of the real goal, and use the watered goal to drive themselves toward it. When leaders do this, however, they often forget to communicate the real goal and the employees shrug and go after the quantity that’s been asked of them.
A sad and common version of this in companies is when a purpose-driven company starts out with the goal of helping make the world better for some particular type of person. Within a few iterations this often turns into a goal of selling stuff to that particular type of person, whether or not that stuff is what they need.
5. Relationships
The evaluation of a goal often gets warped by the relationship between the evaluator and the person whose goal it is. A manager sits down to evaluate an employee's goal and decides that it wasn't a reasonable goal to set. Here are some underlying psychologies that might drive that belief:
It genuinely wasn't a reasonable goal to set. This does happen, after all.
The manager likes the employee, and doesn't want them to be disappointed.
The manager worries that they will look bad if their employee misses a goal.
The manager worries the employee will decide to start looking elsewhere, and the manager will have to do a lot of work to keep or replace them.
Since much of that thinking is unconscious, it's often the case that managers don't even know which is the primary driver. Nevertheless, we should not trust evaluations that are performed by someone with a close relationship to the evaluated.
Cargo Cult Objectivity
All together this adds up to cargo cult objectivity, and we see it all the time. Consider for a moment a company like Blackberry (originally Research In Motion or RIM). Once upon a time half of the phones sold in the US were made by Blackberry. From that peak they went on a decade long slide to complete irrelevance and finally to bankruptcy. During that time they used all of the normal tools that companies do to incentivize performance — quarterly goals at every level, performance reviews, PIPs, bonuses and options for key employees, etc. Consider how many employees received positive performance reviews and earned their bonuses. As a company, they did what is expected in Corporate America.
It didn't prevent them from collapse.
Corporate Executive Board is a management consultancy that has done research including in-depth analysis of 50 companies that ran into what they call "stalls" — slowdowns in growth for well-performing companies .
What the exhibit demonstrates is that the vast majority of stall factors result from a choice about strategy or organizational design. They are, in other words, controllable by management.
In large organizations, strategy and organizational design isn't controllable by "management" as in your average line manager, as in the person writing quarterly goals and performance reviews. It's something the CEO and executive team decide, and everyone else just has to live with it. Which means everyone else can go about their business trying to achieve their quarterly objectives and it just won't matter when the strategy — and from it the objectives — are wrong. But because we have quarterly objectives and a goal setting process, it feels to most people like we must be going in the right direction. This is the illusion of objectivity at its finest.
4.
Because relationships warp objectivity, those looking for the most objective viewpoint are advised to bring in someone who is fully outside the context being measured.
For public companies, we require audits done by an external auditor. Internal auditors may be valuable to the company, but are hard to trust on their own. For a consumer product, we trust an external consumer product tester like Consumer Reports over the claims of the product's marketing department. For rules, we trust a regulatory agency more than companies promising to self-govern. For science, we trust citation scores and journal prestige over the scientist's claims of how important or true the work is.
But each of these can and has been corrupted in the past. External auditors can be sloppy and lack access to the true nature of things, or can be tricked, cajoled, threatened or bribed. Astroturfing is the name for marketing departments attempting to pretend to be independent third parties on review and chat sites. A particularly well known version of this is that mattress companies have taken to buying up or starting mattress review companies online. Regulatory capture is the technical term for companies corrupting the regulatory agencies that are responsible for setting the rules of their industry. Citation counts and impact factor have warped scientist behavior in a number of ways, including driving self-citations and self-promotion to get citations, created a whole industry of fraudulent journals, and other challenges.
The most important aspect of independence is to look at the incentives. On the more independent end, Consumer Reports earns its money directly from subscriptions and needs to maintain the subscribers' trust. On the less independent end, mattress review websites earn their money from the mattress companies when someone buys a mattress. External auditors are paid by the company being audited, but their behavior is bounded by law and they are typically accredited by an organization that sets professional expectations as a minimum bar they must meet.
The stock market takes independence to a higher level by offering a financial incentive to everyone to participate in the independent audit of corporate reality. This is a pattern we'll discuss in more detail because it's a tremendously powerful one.
When incentive schemes are set up on top of evaluations, they are always better when the evaluation is fully independent. The most obvious of these is that stock options are a better incentive scheme than quarterly bonuses, since an employee has a lot more difficulty gaming the stock price than their manager's performance review. But a subtler version of this is that sales and customer satisfaction metrics are a better (though vastly imperfect) measure of employee performance than manager ratings.
5.
I’ve mostly been discussing objectivity in the context of business, but the vast majority of it works the same in any bureaucracy. However, there are a few structures that exist in the public sector but don’t generally exist in the private. Inspectors General, Ombudsmen and whistleblower protection laws are efforts to prevent agencies from abusing their authority and give incentives for pushing back against overreach. Ideally IGs and Ombudsmen would be independent, but because they often end up with strong relationships with the agencies they watch over, it’s more often an illusion of externality. Whistleblower protection laws are nice in theory, but in practice, there’s no external power structure that supports and protects whistleblowers, and whistleblowing is likely to ruin your life.
Our hope for journalism is that it will discover and expose the frauds, lies and abuse of government agents. The journalism industry is sometimes called the Fourth Estate for this reason. Unfortunately journalism is given no formal standing in our society and has to earn enough money to be able to do this work entirely by selling advertisements and subscriptions. This is setting aside the argument of whether journalism as it is done now provides more cover for the state than it helps with uncovering its failures. Since the primary evaluation of journalism is whether or not subscribers or advertisers will pay for it, it should be unsurprising that it’s fairly easy to corrupt.
There is also the Government Accountability Office (GAO). Founded 100 years ago, the GAO was originally intended to ensure the government spent money on things that Congress said it should. In many ways, the GAO's growth into a general overseer parallels the creation of the modern regulatory bureaucracy. Originally charged with ensuring that expenditures were legal, the GAO now has a sprawling responsibility to be an external auditor of the Executive branch and its agencies. The GAO issues reports to Congress detailing all sorts of problems within the regulatory bureaucracy.
Sometimes those reports cause real change. But as often they are ignored or become political footballs. This comes back to incentives: though many legislators got into the game with the best of intentions, the driving drumbeat of their career is that they must win a new election every 2-6 years. You would think (or want to hope) that holding regulatory agencies accountable would be helpful with garnering votes, but it’s rarely true. Even if the topic is direly important to their particular constituents, a legislator is only one legislator, and can only make so much hay out of holding an administrator's feet to the fire at a Congressional hearing. Grandstanding on topics that get news coverage is often more helpful than building a genuine understanding of what went wrong and driving change. We see a lot of representatives more focused on getting the right soundbite on TV than really fighting for improvement. Without stronger incentives for bureaucratic leaders than simply avoiding embarrassment, Congressional accountability is often inadequate.
Regulatory administrators in front of Congress often face a similar situation to CEOs on quarterly shareholder calls. When profits miss, CEOs have to come up with excuses and explanations and show how they are taking the case seriously. There's certainly some risk to them in those moments, and they take them seriously, just like agency administrators. But salespeople sweat far more when companies miss sales forecasts: their personal compensation is directly tied to those outcomes, and they’re vastly easier to fire.
6.
Objectivity is a tremendously useful tool that can be seriously misused. When faced with a system, or designing a new system, we need to understand how this sort of objectivity is used to ensure we aren’t pursuing a cargo cult. We need to know who is responsible for making evaluations, and whether the incentives are balanced, or if there’s a structural reason to expect certain kinds of evaluations over others (positive vs negative, for example). We should prefer external evaluations over internal ones, we should prefer systems where the incentive to be objective is tied to the outcomes we actually want rather than being focused solely on quantifiable steps taken, process followed, or purely free-floating. We should actively incentivize people to discover problems in the system and fix them.
If you find my work interesting, please share and subscribe. It helps tremendously.
The incentives for you to disprove their 27% claim are all wrong. You don’t have the time to run a study to show it doesn’t work. If you do adopt the product and it doesn’t show a 27% improvement, the problem might not be the product, it might be you! After all, it worked for this other company! The only person out there with an incentive to measure this product’s failure is this product’s competitor. Unfortunately, the competition is much more interested in showing how much better they are than how the whole idea was in the first place.
Brendan Burchard described this watering down process pretty well when he was arguing for his proposal for an alternative framework that he calls DUMB goals. His framework attempts to fight back against the watering down of goals by setting big ambitious goals that aren't directly measurable and then identifying measurable goals that link back up to them.