Which one is more biased, you or your data?

Mikko Polvi

November 1, 2023

Short answer: both. Right answer: neither to the extent that matters, if you pay attention to it.

We are all affected by cognitive biases. They are flaws in our thinking processes that can lead to distorted decision making.

In addition to our thinking, also the underlying data that our seemingly rational decisions are based on, can be biased in many ways.

While there is no silver bullet for overcoming these biases easily, solution always starts with acknowledging their existence. In this text we’ll explore some of the most important biases, both in your decision making and in your data.

Why biases matter?

In today’s data-centric world, informed decision-making is the linchpin of success, whether in business strategy, team dynamics, or personal career advancement. Here’s why cognitive and data biases are critical factors to consider:

Business Strategy & Growth: Biased data can lead businesses astray, resulting in wasted resources, misalignment with market needs, or even reputational damage. Accurate and unbiased data is essential to inform strategies that genuinely resonate with target audiences.

Holistic Decision-Making: Biases, whether arising from data or inherent cognitive processes, can skew our perception and judgment. For any professional aiming for comprehensive, sound decisions, understanding and mitigating these biases is increasingly important.

Team Dynamics: Cognitive biases can shape how leaders perceive their teams, potentially leading to favoritism, missed talent opportunities, or inefficient team structures. Recognizing these biases can foster a more inclusive and high-performing workplace environment.

Personal Career Development: On an individual level, cognitive biases can blindside us to opportunities or risks. They might make us overly confident in our abilities, leading to missed learning experiences, or conversely, prevent us from taking calculated risks due to undue self-doubt.

Data biases

Data biases occur when there are errors or limitations in the collection, analysis, or interpretation of data. This can lead to the dataset being inaccurate and failing to represent the whole relevant population. Data bias can significantly impact data analyses and predictions, especially in machine learning and AI applications.

Here are some of the most common types of data biases:

Sampling Bias

Sampling bias occurs when some members of the target population are more likely to be included in the sample than others. In biomedical research sampling bias is sometimes referred to as ascertainment bias. One typical example is that collecting data from private hospitals can lead to a dataset with underrepresentation of people with a low socioeconomic status. Historically, some clinical trials, especially in cardiology, have underrepresented women.

Biased sample can distort the results of a study or analysis and can mistakenly end up highlighting a phenomenon which is not actually a phenomenon in the whole population, but a feature of the biased group. It can also lead to results that are not applicable to the whole population, for example in the earlier case of cardiology, could lead to treatments not useful to women.

Best way to prevent sampling bias is to find a dataset or a registry that includes the whole population of interest, so you don’t need to do any sampling in the first place. This, of course is difficult to achieve, so when doing sampling you must take extra steps to ensure the sample is representative of the whole group you are studying.

Aggregation bias

When trends from aggregated data are incorrectly assumed to apply also to individual data points, it’s called aggregation bias, or ecological fallacy. For example, a football team with a superior goal scorer like Lionel Messi or Teemu Pukki is as a team likely to score a lot more goals than the average team. But from that you can’t deduct that all players of that team are above average goal scorers. In fact, most of them could be below average and the aggregated data has been biased by a subset of the group (in this case by Messi or Pukki)

Aggregation bias can lead to incorrect conclusions and misguided policies or interventions. When data is aggregated, the individual variations and nuances within the data are lost. This can hide important patterns, relationships, or distinctions that exist at the individual level. Making decisions based on these flawed conclusions can lead to inefficiencies, wasted resources, or unintended negative consequences. In the realm of public policy, health, economics, or social sciences, these consequences can significantly impact people’s lives.

One way to avoid aggregation bias is to try to use individual data as opposed or in addition to aggregated data to discover the true relationships between variables.

Systemic Bias

Systemic bias in data refers to inherent biases present within datasets because of structural inequities, societal norms, or the methods and practices of data collection and interpretation. When systems or processes that generate data are influenced by these broader societal biases, the resulting data can carry forward and reinforce these biases. For example, a dataset of startups that have successfully received VC funding could point out that you need to be a white male to be able to get funded.

Such biased data can lead to misguided conclusions, policies, or practices, which can reinforce and perpetuate existing societal inequities. Clinical trials have historically underrepresented certain demographics, such as women or ethnic minorities. This can result in medical guidelines that are less applicable or even risky for these underrepresented groups.

Addressing systemic bias in data requires both technical solutions (like algorithmic adjustments) and broader societal interventions to challenge and change biased systems and norms.

Survivorship Bias

Survivorship bias is the fallacy of focusing on data points that got collected and overlooking those that did not. Classic example of this is the case of British Airforce in WW2 where they examined bullet holes from bomber planes returning from battle and strengthened the spots that had the most holes in them. This of course ignored the planes that didn’t return home and that were hit in the most critical parts like the engine. Survivorship bias can be seen as an example of sampling bias.

Constantly asking questions about your data, “what about the products that didn’t succeed?” or “what factors might cause a patient not to be part of the data?” can help you address these biases.

Cognitive biases

These are some examples of the many cognitive biases that can affect our decision making on a daily basis.

Recency Bias

Recency bias influences the way people perceive and make decisions based on recent events or information, giving more weight to the most recent data while downplaying or neglecting older information.

Recency bias may cause you to get all excited about recent trends or patterns, overlooking longer-term trends that could provide a more accurate picture of the situation. This can lead to knee-jerk reactions based on short term fluctuations instead of long-term improvement.
There is also speculation that in the age of data abundance (and faster and faster market cycles), our ability to access and process historical data reduces, hence heightening the possibility of us falling victims of the bias.

As always with biases, being aware of its existence is the first step to fight recency bias. It’s also important to have a structured decision-making process that doesn’t fluctuate too much when new information pop’s up. Recency bias can also explain a bit why it’s often beneficial to have seasoned industry veterans in your team to balance things out with their long-term views.

Anchoring bias

Arguably the most robust psychological effect influencing our decision making is the anchoring bias. Anchoring bias is the tendency to rely too heavily on the first piece of information (the “anchor”) we receive when making a decision, even if it is not relevant to the decision at hand.

One common way to exploit the anchoring bias are discounts. How happy are you to buy a t-shirt for only 100€ when you see that the “original” price was 500€?

In data-driven decision making, this bias can lead to over-reliance on certain metrics or data points that were introduced early in the analysis process. Anchoring bias is also considered such a fundamental bias that it’s thought to be a driver behind several other biases, like the planning fallacy or the spotlight effect.

Like many other biases, it already helps to be aware that this bias exists. However, anchoring bias is considered so strong that it is probably impossible to avoid completely. One way to counter it is to spend some time arguing why the anchoring information is not relevant in the first place. “Normal t-shirt could not be worth 500€”.

Framing bias

Can your decisions be manipulated by presenting information in a positive or negative light?

Framing bias is when people’s opinions and decisions are influenced by the way information is presented to them, even if the information is the same. This can happen especially in politics and advertising, where people might be swayed by the way a message is presented, rather than the actual content.

Framing bias can have important implications for how people make decisions and form opinions. If information is presented in a certain way, people may be more likely to be swayed in a particular direction, even if the information is incomplete or biased. One common example is using scale of colours: deeper red is for bad, deeper green is for good. But what are the thresholds when the colours change? Another example is presenting fat free food as a good, healthy option. But if the fat has been replaced by sugar, is it really any healthier.

To avoid being influenced by framing bias, we should be aware of how information is presented to us and try to look past the way it’s presented. Try to understand the actual information and not just the way it’s presented. Try asking questions to get behind the presented information. Get other opinions, be mindful of your emotions and always acknowledge your own perspective can affect your decisions.

Confirmation bias

Are you making your decisions based on data or just using data to justify decisions you have already made?

Confirmation bias is the tendency for people to look at information in a way that confirms their pre-existing beliefs or hypotheses. It can lead to the reinforcement of inaccurate beliefs and the dismissal of contradictory evidence. Often this happens without you even realizing it.

If you let confirmation bias take over your decision-making process, you might ignore important information, just because it doesn’t fit what your brain already thinks. Instead of objectively analyzing the numbers, you’ll just start justifying your earlier decisions and thoughts. This in not data-driven decision making, it’s what Google’s Cassie Kozyrkov calls “data inspired, at best.”

Try setting rules and thresholds for your decisions before seeing the actual data. So, you might say “I’ll cut this product from our portfolio if the sales is not up by minimum 10%” or “I’ll continue using the ML-based method for prospecting if the conversion rate from those prospects is up by minimum of 6%”. And you’ll set these thresholds before you see the corresponding numbers. This can help you make better decision and stick to what you originally set out to do.

Summary

Bias affects both our thinking and the data we rely upon. Cognitive biases are flaws in our thinking processes that can distort our decisions. These include recency bias, where recent events unduly influence decisions, anchoring bias, which sees us giving undue importance to initial information, and confirmation bias, where we prioritize information that aligns with our beliefs.

The data we use is also prone to bias. Data biases arise from errors in data collection, analysis, or interpretation. Common data biases include sampling bias, where certain population segments are under or overrepresented; aggregation bias, where aggregate data misrepresents individual data points; and systemic bias, where societal norms and structures influence data collection and interpretation. Recognizing these biases is crucial, as they can misguide business strategies, skew perceptions, affect team dynamics, and hinder personal career development.

Which one is more biased, you or your data?

Mikko Polvi

November 1, 2023

Why biases matter?

Data biases

Sampling Bias

Aggregation bias

Systemic Bias

Survivorship Bias

Cognitive biases

Recency Bias

Anchoring bias

Framing bias

Confirmation bias

Summary

0 Comments

Submit a Comment Cancel reply