Understanding Chi-Square with Examples

Certain tools stand out in the vast landscape of statistics for their versatility and applicability across various fields. Among these, the chi-square test holds a significant place. Its ability to analyze categorical data and detect patterns or associations makes it indispensable in fields ranging from biology and social sciences to business and beyond. In this blog post, we will delve into the fundamentals of the chi-square test, explore its applications, and illustrate its effectiveness through real-world examples.

What is Chi-Square?

The chi-square (χ²) test is a statistical method used to determine whether there is a significant association between two categorical variables. It compares the observed frequencies of different categories with the frequencies that would be expected under a null hypothesis of no association or independence between the variables.

The Chi-Square Test

The chi-square test can be applied to various types of data, including:

Goodness-of-Fit Test: This test determines whether a categorical variable’s observed frequency distribution matches a theoretical distribution. For example, a biologist might use a goodness-of-fit chi-square test to determine whether the observed genotype frequencies of a population conform to the expected frequencies based on Mendelian genetics.
Test of Independence: In this scenario, the chi-square test evaluates whether there is a significant association between two categorical variables. For instance, a researcher analyzing survey data might use a chi-square test to examine whether there is a relationship between gender and political affiliation among respondents.

Real-World Examples

Let’s explore a couple of real-world examples to illustrate the application of the chi-square test:

Example 1: Market Research

Suppose a marketing team wants to assess whether there is a significant association between customer age groups and preferred modes of communication (email, phone, or social media). They collect data from a sample of 275 customers and obtain the following results:

	Email	Phone	Social Media
18-30	50	30	20
31-45	40	45	15
45+	20	25	30

Using a chi-square test of independence, the marketing team can determine whether there is a statistically significant relationship between age group and preferred communication mode.

Step 1: Calculating the Observed Frequency

The table above shows the collected data, in which case we have the participants’ values sorted by age and mode of communication. This kind of data can, in Chi-square, be said to contain the observed frequency. In other words, given this kind of data, you do not need to calculate the observed frequency since it is already given.

Step 2: Calculating the Expected Frequency

In the chi-square test, there is a method of calculating these frequencies, as shown below.

\begin{equation}
\text { Expected Frequency }=\frac{(\text { Rows } \times \text { Columns }) \text { Totals }}{\text { Overall Totals }}
\end{equation}

For example, the total of the email column is given as follows:

\begin{equation}
\text { Email }=50+40+20=110
\end{equation}

The sum of the participants in the 45+ age group is given as follows:

\begin{equation}
45^{+}=20+25+30=75
\end{equation}

Using the formula in this step, we get the expected frequency for the respective cell as follows:

\begin{equation}
\text { Expected Frequency }=\frac{110 \times 75}{275}=30
\end{equation}

Proceeding in this manner produces the table below.

Expected	Email	Phone	Social Media	Total
18-30	40	36.3636	23.6364	100
31-45	40	36.3636	23.6364	100
45+	30	27.2727	17.7273	75
Total	110	100	65	275

Step 3: Critical Value Calculation

In this case, we apply the chi-square formula below.

\begin{equation}
\chi^2=\frac{(\text { Observed }- \text { Expected })^2}{\text { Expected }}
\end{equation}

Note that the above formula applies to each cell. However, what we need is the sum of the results produced by the above formula, which is expressed as follows:

\begin{equation}
\chi^2=\sum\frac{(\text { Observed }- \text { Expected })^2}{\text { Expected }}
\end{equation}

Now let us use the values of the cell (Email, 18–30). In this case, we have the following frequencies:

Observed = 50 while expected = 40

Therefore, its chi-square value is given as follows:

\begin{equation}
\chi^2=\frac{(50-40)^2}{40}=2.5
\end{equation}

Again, proceeding in this manner gives us the following results:

	Email	Phone	Social Media	Total
18-30	2.5	1.1136	0.5594	4.1731
31-45	0	2.0511	3.1556	5.2067
45+	3.3333	0.1894	8.4965	12.0192
Total	5.8333	3.3542	12.2115	21.3990

In this table, the value that is the result of the second formula in this step is 21.3990. This is the value that will be used in step 4 below, together with other values as will be indicated.

Step 4: P-Value Calculation

This is the last step in the chi-square. It uses two significant values in the calculation: degree of freedom and critical value. This is only applicable if you want to do the calculation; otherwise, you could check the value in the chi-square table. You could use chi-square calculators or just an Excel Sheet to complete the calculation. Doing so produces 0.00026, a smaller value than the normal significant value of 5%. We reject the null hypothesis in this case, suggesting that the relevant variables are significant or dependent.

By applying a chi-square test, the researcher can determine whether there is a statistically significant association between the type of instructional intervention and students’ learning outcomes.

Conclusion

The chi-square test is a versatile statistical tool that enables researchers and analysts to assess relationships and associations between categorical variables. Whether in market research, biology, the social sciences, or any other field, understanding and correctly applying the chi-square test can provide valuable insights and inform decision-making processes. By examining real-world examples, we’ve seen how this powerful tool can uncover meaningful patterns and relationships within data, ultimately contributing to advancements in knowledge and practice across various domains.