THE CHI SQUARED TEST


This is a statistical technique you will need for the AGS paper. It is pronounced to rhyme with 'sky' rather than 'tea'.

Used when the aim is to observe differences between comparable sets of data.

Data needs to be collected so that it can be grouped into classes.

A null hypothesis has to be put forward, which is usually that there is no pattern to the data, or that it is distributed randomly.

The value for chi squared is then compared to significance tables, and these will confirm whether any deviation from random in the observed data is by chance or is statistically significant.

If it is achieved, then the null hypothesis is rejected (5% is considered a satisfactory level from fieldwork data) All that it has proved is that there is not a random distribution, you haven't necessarily proved a causal link. Main limitation is with small samples. If any expected value is below 5, the test becomes invalid.

We will use an example of corrie or cirque orientation.

Corries were identified from maps and the direction they face was recorded. Data was placed into 4 categories, relating to the compass. Results shown below:

Orientation from N Frequency
0-89° 30
90-179° 5
180-269° 6
270-359° 11

Total number is 52.

Is this distribution random, or significant ?

Start by developing null hypothesis: The orientation of corries is random.

If this was correct, we would expect there to be how many corries in each category ?

52 / 4 = 13 in each.

This is obviously not the case, but the test will determine whether the differences are significant.

Formula is below:

X2 = Sum of (O-E)2 / E

O = observed frequency E = expected frequency

Corrie Data is set out as in table below. No 0 or E value should fall below 5.

 

 

0-89 90-179 180-269 270-359 TOTAL
0 30 5 6 11 52
E 13 13 13 13 52
0-E 17 -8 -7 -2  

 

(O-E)2 289 64 49 4  

 

(0-E)2/ E 22.23 4.92 3.77 0.31 31.23

The value is 31.23

Has to be checked with significance tables.

Need to determine what are called the degrees of freedom.

This relates to the size of the sample, and is n-1 = 3.

For 3 d.o.f, value is 7.82 at the 5% significance level.

It is 11.34 at 1% level.

Since our value is greater than the value on the table we can reject the null hypothesis: there is less than 1% chance of the corrie orientation being random: there is some preferred orientation.

The example above tests one set of data against a theoretical frequency distribution. The 2nd use of Chi-squared is to compare 2 or more sets of data. This involves the production of a contingency table.

For example: here is some humidity data of 2 stations: one near the sea and one far away.

Relative humidity % Near sea Away from sea  

 

50-55 6 35 41
56-60 17 16 33
61-65 26 3 29
 

 

49 54 103

Note data is grouped. Sample size must be at least 20. There must be at least 1 observation in each class.

Data need to be arranged into a contingency table. In this case, there are 2 columns, 3 rows, and therefore 6 cells. The expected frequency for each cell needs to be worked out: using formula below:

E value for cell = column total x row total

                                  grand total

Can then work out the 0-E and square those for each cell to come up with the total which is the chi squared value which we then compare to the values in the significance tables.

This time, degrees of freedom are calculated using the formula:

d.o.f = n(rows)-l x n(columns)-1

i.e 2 x 1 = 2

Calculated value = 38.6, so can reject the null hypothesis again.



Having done all this, I then found THIS LINK to GeoNet where there is a useful section on the Chi Squared Test.

RETURN TO AGS PAPER PAGE