In the realm of statistics, central tendency measures form the bedrock of data analysis, serving as the foundation upon which inferences and other statistical measures are built. This article aims to delve into the intricacies of central tendency measures, exploring their significance, types, and practical applications. From understanding what central tendency measures are to why they are essential and how to calculate them, this guide covers it all. It is the fist phase in descriptive statistics.
Central Tendency Measures [CTM]: A Deep Dive
Definition: Central tendency measures, often referred to as summary statistics, are singular values that pinpoint the central position of a dataset. These values encapsulate the essence of the entire dataset, providing a concise summary known as the measure of central tendency.
Different Measures Explored:
Mean (Average):
Types:
- Arithmetic Mean
- Geometric Mean
- Harmonic Mean
Highly influenced by outliers, with arithmetic mean being the most impacted. The relationship between these means is such that Arithmetic Mean > Geometric Mean > Harmonic Mean. It’s crucial to note that geometric and harmonic means cannot be calculated for datasets with values less than or equal to 0.
Arithmetic Mean will be highly impacted by big number outliers & Harmonic Mean will be least impacted by big number outliers. Also, Arithmetic Mean will be least impacted by small number outliers & Harmonic Mean will be highly impacted by small number outliers.
Mean or Average is also know as Mathematical Average.
Median:
A midpoint dividing the data into equal upper and lower halves, known as positional average. Resistant to the influence of outliers. Here 50% data points will be in upper half and 50% data points will be in lower half.
Mode:
The value with the highest frequency, also known as positional average. It can be a single value or multiple values if there is a tie in frequencies. This will the value repeated highest number of times.
The Why Behind Central Tendency Measures:
- Summarized View of Data:
- Gain a quick overview of the data distribution.
- Comparison of Different Groups:
- Understand and compare groups with distinct characteristics.
- Note:
- Similar central values don’t necessarily imply similarity in characteristics.
- It is also possible that different groups having different characteristics will have same central values. In such cases, Central Tendency Measures wont be suffice for the comparison purpose.
- Comparison of Individuals within a Group:
- Evaluate individual performance within a group.
- Apart from central location of data, CTM wont provide much more meaningful information and hence CTM is required to be used with Measures of Dispersion that we will cover in the next post.
- With CTM we can say that all the data points are distributed around the central value, but how far or near are these data points from the central value can be know by Measure of Dispersion only.
Understanding Through Example:
Consider 100 students taking exams in English and Mathematics of 100 Marks.
Mean[Average]
- Consider that average marks in English is 80 and average marks in Mathematics is 50.
- Based on the details we can say that marks of 100 students will be around 80 in English and will be around 50 in Mathematics.
- But how far or near these marks will be from 80 and 50 marks of respective subject can be only obtained using Measures of Dispersion.
- We can also state that students are good in English compare to Mathematics. The reason can be anything like English paper was easy compared to Mathematics or students have found English easy compared to Mathematics. [Comparison of 2 groups]
- Now let’s say a student has got 80 marks in Mathematics, here we can say that the student has got good marks compared to average marks. [Comparing individual with the entire group].
- We can also state that the student is doing good by having marks above average. We can also state that student is good in Mathematics.
Median
- With the help of median, we can find that if a student has got marks in upper half or lower half.
- Also we can state that if a student is in lower half than the student is not doing good and if its in upper half than its doing good.
- Consider that median marks in Mathematics is 50 which is similar to Mean marks.
- Here we can say that 50% of students will have marks less than or equal to 50 and other 50% student will have marks greater than equal to 50.
- If any student has got 80 marks in Mathematics than the student falls in first half and doing good than other 50% [lower half] of the students.
Mode
- From mode, we can find that what is the maximum grades/marks most of the students hold.
- Consider that Mode in Mathematics is 50 Marks which means that most of the students have got 50 Marks in Mathematics.
- Here Mode=50 means more than 1 students have got 50 marks hence here this frequency can be 2 or more than 2.
- Also, its not necessary that mode will be single value it can be multiple values as well.
- For example 4 students have got 50 marks, 4 students have got 60 marks, 4 students have got 40 marks then Mode in this case will be 40, 50 and 60 marks.
- Also here Mean=Median=Mode means Mathematics marks follows Normal distribution. If one parameter value is higher or lower than other than it wont follow Normal distribution.
However, remember: Central tendency measures are just the first chapter in the story of your data. They paint a broad picture, but don’t tell the whole story. To truly understand how your data is spread out and uncover hidden patterns, you’ll need to delve deeper into measures of dispersion like variance and standard deviation.
Want to put these heroes to the test? Download the accompanying code and data to practice calculating central tendency measures yourself! See how the mean, median, and mode reveal different aspects of your data, and unlock the power of statistics to turn mountains of information into actionable insights.