Machine Learning · December 27, 2023

Types of missing data: MCAR, MAR, MNAR

Data, the precious fuel of insights, isn’t always complete. Like that half-eaten cookie in the pantry, sometimes crucial bits go missing. But unlike cookies, missing data can wreak havoc on your analysis, leaving you with conclusions as crumbly as you feel. This post delves into the three main types of missing data – MCAR, MAR, and MNAR – helping you identify and tame these data gremlins.

Sr. NoAttributesMCARMARMNAR
1Full FormMissing Completely At RandomMissing At RandomMissing Not At Random
2DescriptionThe data will be missing randomly.The data is missing due to some reason.

The reason will be due to observed variable it self.
The data is missing due to some reason.

The reason will be due to unobserved variable. i.e., researcher has not consider specific variable which is causing missing data.
3Reason for missing data?No reason. It is missing due to any random reason. No specific case.Person don’t want to answer intentionally or unintentionally.

Might be appropriate Q&A is not there in survey.
Person don’t want to answer intentionally or unintentionally.

Might be appropriate Q&A is not there in survey.
4Missing data depends on?NoneAnother non missing attribute/variable/column.

Let’s say we have 2 columns Position and Salary.
Here Salary is missing due to Position.

Let’s say someone is CEO but don’t want to provide the salary as its not at par compare to other CEOs’.
Unobserved/Not Considered [attribute/variable/column]

OR

Column/Variable/Attribute having missing value itself.

Let’s say we have 2 columns Position and Salary.
Here Salary is missing either due to missing variable “Experience” or “Salary” column itself.

Let’s say the salary is low due to low experience or the person don’t want to disclose it becuase the salary is low compared to other salaries.
5Missing data is meaningful?NoYesYes
6Impact on analysis
Bias result.

Not provide complete picture of population.

Overall analysis will be wrong, which will result in incorrect conclusion.
Same as MCARSame as MCAR
7How to overcome?Two methods,

Deletion

Imputation
Same as MCARSame as MCAR
8Methods to overcomeListwise Deletion

Pairwise Deletion

Mean Imputation

Regression

Imputation

Single Imputation

Multiple Imputation
Listwise Deletion

Pairwise Deletion

Mean Imputation

Regression
Imputation

Single Imputation

Multiple Imputation
Listwise Deletion

Pairwise Deletion

Mean Imputation

Regression
Imputation

Single Imputation

Multiple Imputation

Try to get more variables/columns/attributes

Example:

Sr. No.GenderDesignationSalary
1MaleCEO1000
2FemaleCTO800
3MNAR1HOD500
4FemaleHODMAR2
5MaleHODMNAR3

MCAR: Missing Completely at Random

  • Imagine a mischievous goblin randomly sprinkling confetti over your dataset. Some values disappear, but it’s purely by chance, unrelated to any variable in the data. This scenario, thankfully, is easiest to handle, like simply brushing off the confetti and analyzing what remains.
  • Missing Completely at Random (MCAR) is the enigma that keeps statisticians on their toes. In MCAR scenarios, the absence of data is entirely random and unrelated to any observed or unobserved variables. It’s like a magician making data disappear without a trace.
  • Example: Imagine a survey where participants forget to answer certain questions due to random distractions. The missing data here is scattered across the sample without any discernible pattern.

MAR: Missing at Random

  • Picture a more discerning goblin, targeting specific parts of your data based on known variables. While frustrating, MAR’s patterns can be deciphered using statistical techniques. It’s like piecing together a puzzle where some pieces have clues on the back.
  • Missing at Random (MAR) introduces a twist to the narrative. Here, the missingness is related to observed variables but not directly to the missing data [unobserved variable] itself. It’s like a puzzle where the missing pieces are connected to what’s already in place.
  • Example: In a health survey, individuals may be less likely to disclose certain health conditions based on their gender. The missing data is tied to the observed variable (gender) but not explicitly to the missing information.

MNAR: Missing Not at Random

  • Now we’re dealing with a truly wily goblin, vanishing data based on the missing values themselves. This is the trickiest case, like a puzzle with missing pieces and no clues, requiring advanced methods or even discarding some data.
  • Missing Not at Random (MNAR) or Non-Ignorable Missingness is the detective’s challenge. The probability of missing data is related to the missing values themselves i.e. unobserved variable which researcher has not taken into consideration while designing questionnaire or survey, even after considering observed variables. It’s like data playing a game of hide and seek, and the missing values are intentionally elusive.
  • Example: Consider an income survey where high-income individuals might be less likely to disclose their exact income. The missing data is indirectly related to the variable under consideration (designation). Here the employee with lowest designation might have high income from other sources but the person don’t want to disclose it for any reason.

Footnotes:

  1. Here missing data depends only on “Gender” column itself and not on another columns.
    The reason of missing data is either there are only 2 options in “Gender” “Male” and “Female”.
    OR There are other options in “Gender” but person don’t want to disclose it.
    “Gender” not disclosed intentionally as the person don’t want to disclose it will be considered as MNAR. ↩︎
  2. It depends on “Designation” column. The reason of missing data is the person’s “HOD” designation has very low salary i.e. 100 and she does not want to disclose it.
    “Salary” not disclosed due to “Designation” will be considered as MAR. ↩︎
  3. Here missing data depends only on “Salary” column itself or lets say some missing column “Experience”. In this case due to low salary compare to others the person don’t want to disclose it or due to low experience the salary is low and the person don’t want to disclose it.
    “Salary” not disclosed due to “Experience” not observed attribute or “Salary” itself will be considered as MNAR. ↩︎

For video click here or watch below: