Data analysis update

Whilst we wait for all of the data from the project partners to arrive, Bryony and I have done a quick & dirty analysis of the data we’ve received so far.

The good news (touch wood!) is that we’re still on track to prove the project hypothesis:

“There is a statistically significant correlation across a number of universities between library activity data and student attainment”

The data we’ve looked at so far has a small Pearson correlation (in the region of -0.2) that has a high statistical significance (with a p-value of below 0.01).

The reason we’re seeing a negative correlation is due to the values we’ve assigned to the degree results (1=first, 2=upper second, 3=lower second, 4=third, etc).

We suspect one of the reasons for the small Pearson correlation is the level of non & low usage (which is something we’ve looked at previously in Huddersfield’s data). Within each degree level, there are sizeable minorities of students who either never made use of a library service (e.g. they never borrowed any books) or who only made low use (e.g. they borrowed less than 5 books), and it’s this which seems partly responsible for lowering the Pearson correlation. However, the data shows that:

  • students who gained a first are less likely to be in that set of non & low users than those who gained a lower grade
  • students who gained the highest grades are more likely to be in the set of high library usage than those who gained lower grades

