Types of Missingness - 2.2.1 | 2. Data Wrangling and Feature Engineering | Data Science Advance
K12 Students

Academics

AI-Powered learning for Grades 8–12, aligned with major Indian and international curricula.

Academics
Professionals

Professional Courses

Industry-relevant training in Business, Technology, and Design to help professionals and graduates upskill for real-world careers.

Professional Courses
Games

Interactive Games

Fun, engaging games to boost memory, math fluency, typing speed, and English skillsβ€”perfect for learners of all ages.

games

Interactive Audio Lesson

Listen to a student-teacher conversation explaining the topic in a relatable way.

MCAR - Missing Completely At Random

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Today, we'll discuss the first type of missing data: Missing Completely At Random, or MCAR. This occurs when the likelihood of a data point being missing is completely unrelated to any other information. Can anyone give an example of this?

Student 1
Student 1

Maybe when someone accidentally skips a question on a survey?

Student 2
Student 2

Or when there's a glitch in data collection that causes random rows to be missing?

Teacher
Teacher

Exactly! MCAR has no bias on statistical inference, which is why it’s less problematic. Remember the acronym: MCARβ€”Missing Completely At Random. It helps in retaining data quality! Why is it important to detect whether data is MCAR?

Student 3
Student 3

So we know how to handle it correctly?

Teacher
Teacher

That's right! Recognizing MCAR allows you to decide whether to delete missing entries without worries. Let’s summarize: MCAR means data is missing randomly, not impacting our results.

MAR - Missing At Random

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Next, we move to Missing At Random, or MAR. In this case, the missingness can be explained by other observed variables. Any examples?

Student 4
Student 4

Perhaps if older individuals tend to skip questions about technology on a survey about digital habits?

Student 2
Student 2

Or if people don't answer because they might feel embarrassed about their income?

Teacher
Teacher

Yes! Although this is more complex, it's still manageable. MAR implies that we might still accurately infer the missing data using the information we have. Remember: MARβ€”Missing At Random. Why is this concept critical in data analysis?

Student 1
Student 1

Because we need to use appropriate approaches, like imputation, to reduce bias.

Teacher
Teacher

Correct! It leads to better performance in predictive modeling.

MNAR - Missing Not At Random

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Now, let’s discuss Missing Not At Random, or MNAR. This is the trickiest type, as the missingness reflects the value of the missing data itself. Can you think of situations where MNAR might apply?

Student 3
Student 3

If someone with a high income did not want to disclose their income due to privacy concerns?

Student 4
Student 4

Or in health surveys where individuals with severe symptoms avoid answering certain questions?

Teacher
Teacher

Exactly! MNAR makes it difficult to analyze data accurately because the missing values are non-randomly distributed. Remember: MNARβ€”Missing Not At Random. How should we deal with MNAR cases?

Student 2
Student 2

We may need to employ more advanced techniques or accept that it's an intrinsic bias in our data.

Teacher
Teacher

Right! MNAR must be treated carefully because it can lead to substantial bias in results. Let's recap: MNAR relates the missingness to the unseen data, leading to potential information loss.

Introduction & Overview

Read a summary of the section's main ideas. Choose from Basic, Medium, or Detailed.

Quick Overview

This section discusses the types of missing data in datasets, specifically MCAR, MAR, and MNAR, and their implications for data analysis.

Standard

The section elaborates on the three primary types of missingness: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), detailing their characteristics and the potential biases each can introduce in data analysis. Understanding these types helps data scientists make informed decisions when handling missing data.

Detailed

Types of Missingness

In data handling, understanding the types of missing values is crucial for accurate analysis. This section details three categories of missingness:

  1. Missing Completely At Random (MCAR): This means that the missing values are entirely random and not related to any observed or unobserved data. For example, a survey participant accidentally skipping a question would be an MCAR scenario. MCAR has no bias on statistical analyses, making it the ideal case.
  2. Missing At Random (MAR): Here, the propensity for a data point to be missing is related to some of the observed data but not the missing data itself. For instance, individuals from a specific demographic may not respond to a survey question more often. Although MAR introduces some bias, if handled correctly, it can often be mitigated through proper imputation methods.
  3. Missing Not At Random (MNAR): In this scenario, the missingness is related to the unseen data itself. For example, high-income individuals might skip questions related to income due to personal reasons. MNAR can lead to significant bias if not addressed properly, as the missing data is inherently connected to the data we analyze.

Understanding these types of missingness is vital in determining the appropriate strategies for handling missing data, which can include deletion, imputation, or using predictive models.

Youtube Videos

Data Analytics vs Data Science
Data Analytics vs Data Science

Audio Book

Dive deep into the subject with an immersive audiobook experience.

MCAR – Missing Completely At Random

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

β€’ MCAR – Missing Completely At Random

Detailed Explanation

MCAR stands for Missing Completely At Random. This means that the missing data has no relationship with either the observed or unobserved data. In other words, the absence of data is entirely random and not related to the specific values of the data. For instance, if a survey respondent skips a question due to accidentally missing it rather than any bias or systematic error, that data is considered MCAR.

Examples & Analogies

Think about a class where a few students are absent on a random day. Their absence does not relate to their performance or any specific situation; it just happens to be a coincidence. Similarly, in data collection, if some responses are missing just by chance without any influence from other factors, that's MCAR.

MAR – Missing At Random

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

β€’ MAR – Missing At Random

Detailed Explanation

MAR stands for Missing At Random. This situation arises when the probability of missing data on a variable is related to some of the observed data but not the missing data itself. For example, women might be less likely to report their income in a dataset that includes both male and female participants, but within those women, the missing income data is random and not dependent on actual income levels.

Examples & Analogies

Imagine you are conducting a health survey. If younger participants are less likely to answer questions about their exercise habits compared to older participants, the missing responses from the younger group do not depend on their actual health habitsβ€”this is MAR. The characteristic of age influences whether data is missing, but not the health habits themselves.

MNAR – Missing Not At Random

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

β€’ MNAR – Missing Not At Random

Detailed Explanation

MNAR stands for Missing Not At Random. This scenario occurs when the missingness is related to the value of the data that is missing. For instance, people with very high incomes may choose not to disclose their income information in a survey, leading to missing data that specifically correlates to the income variable.

Examples & Analogies

Consider a situation where you're trying to gather information on personal savings, but only those with minimal savings choose to respond. In this case, the missing data is definitely influenced by the variable being examined (savings), making it MNAR. This means the nature of the data missing is directly tied to the characteristics of that data.

Definitions & Key Concepts

Learn essential terms and foundational ideas that form the basis of the topic.

Key Concepts

  • MCAR: Data is missing completely at random, affecting no biases.

  • MAR: Missingness is based on observed data.

  • MNAR: Missingness relates to the value of the missing data.

Examples & Real-Life Applications

See how the concepts apply in real-world scenarios to understand their practical implications.

Examples

  • If a respondent skips a survey question purely by accident, it represents MCAR.

  • If a survey shows that only younger respondents skip questions about technology, it depicts MAR.

  • If individuals with higher incomes avoid answering income questions, it illustrates MNAR.

Memory Aids

Use mnemonics, acronyms, or visual cues to help remember key information more easily.

🎡 Rhymes Time

  • MCAR's a breeze, bias-free seas, MAR's got a link, while MNAR makes you think!

πŸ“– Fascinating Stories

  • Imagine a group at a party, where some leave without saying goodbye (MCAR). Others leave because they feel shy about their dance skills (MAR). Finally, the dance-off winner leaves because they don't want to reveal their secret moves (MNAR).

🧠 Other Memory Gems

  • Remember MCAR, MAR, MNAR: 'Completely', 'At Random', 'Not At Random'.

🎯 Super Acronyms

MAR

  • Make Accurate Returns - as this type relies on observed data to maintain integrity.

Flash Cards

Review key concepts with flashcards.

Glossary of Terms

Review the Definitions for terms.

  • Term: MCAR

    Definition:

    Missing Completely At Random; data that is missing without any systematic bias.

  • Term: MAR

    Definition:

    Missing At Random; missing data that can be explained by observed data.

  • Term: MNAR

    Definition:

    Missing Not At Random; missing data that relates to the unobserved parameter itself.

  • Term: Imputation

    Definition:

    The process of replacing missing data with substituted values.