The missing patterns that are generally observed are
- Missing completely at random
- Missing at random
- Missing that depends on the missing value itself
- Missing that depends on unobserved input variable
In data analytics interviews, when asked about missing patterns commonly observed in datasets, it’s essential to demonstrate a solid understanding of the challenges related to missing data. Here’s a structured response:
- Missing Completely at Random (MCAR):
- In this scenario, the missingness of data points is completely random and unrelated to any other variables or the data itself. There are no patterns or reasons behind the missing values. It’s akin to flipping a fair coin.
- Strategies for handling MCAR involve using simple imputation methods like mean, median, or mode imputation or leveraging machine learning algorithms that can handle missing values internally.
- Missing at Random (MAR):
- Here, the missingness is related to observed variables but not to the missing values themselves. In other words, the probability of missingness depends on other observed data.
- Techniques for addressing MAR involve sophisticated imputation methods such as multiple imputation, which uses observed data to impute missing values multiple times to account for uncertainty.
- Missing Not at Random (MNAR):
- This type of missingness occurs when the probability of a value being missing is related to the missing values themselves. In simpler terms, there’s a systematic reason why data is missing, often related to unobserved variables.
- Handling MNAR can be challenging as the missingness is related to the missing values themselves. Techniques such as pattern mixture models or selection models might be employed to account for the underlying mechanisms causing the missingness.
- Seasonality or Time-based Missingness:
- Data may exhibit patterns of missingness based on specific time periods or seasons. For instance, retail sales data might have missing values around holiday seasons due to increased activity or data collection issues during busy periods.
- Addressing this type of missingness might involve analyzing and understanding the seasonal patterns, and potentially applying time-series imputation techniques or seasonal adjustment methods.
- Structural Missingness:
- Missingness may occur due to inherent limitations in data collection processes. For example, certain demographic information might be missing from a survey because respondents chose not to answer those questions.
- Strategies to deal with structural missingness may involve careful consideration of data collection methods and possibly redesigning data collection protocols to minimize missing data in future studies.
In the interview, it’s crucial to not only mention these types of missing patterns but also discuss strategies for handling them appropriately based on the context of the data and the objectives of the analysis. This demonstrates a comprehensive understanding of data analytics principles and practical problem-solving skills.