Suppose you are given a data set which has missing values spread along 1 standard deviation from the median. What percentage of data would remain unaffected and Why?

Since the data is spread across the median, let’s assume it’s a normal distribution.
As you know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.

If the missing values are spread along 1 standard deviation from the median in a dataset, approximately 68.2% of the data will remain unaffected. This is because in a normal distribution, about 68.2% of the data falls within one standard deviation of the mean (or median in this case), according to the empirical rule or 68-95-99.7 rule.

Here’s a breakdown:

  • Approximately 34.1% of the data lies within one standard deviation below the median.
  • Approximately 34.1% of the data lies within one standard deviation above the median.
  • Therefore, the total percentage of unaffected data would be approximately 34.1% + 34.1% = 68.2%.

This is a key property of normal distributions and provides a rough estimate of how much data would remain unaffected given the described scenario.