Thursday, May 28, 2009

IPL2009 Over by Over Analysis

Recently Ananth (the owner of an excellent blog "It Figures" at cricinfo.org) did very good anaysis of numbers that came out of the IPL 2009 version. He provided over by over runs scored (with their averages and standard deviations) and wickets fallen (with averages and standard deviations).

The statistical analysis (at least in first look) matches very well with it madness on field as very correctly captured by the 'Coefficient of Variation' [ratio of std. and mean] . A CoV of one means a random process (e.g. Poisson Process) and if the CoV is more than one it means that there are clustered events. In any case CoV in runs scored per over and wickets fallen per over in completely is a random process.

However, there is still some hope as there is some pattern in the data generated by IPL 2009.

The top panel in the figure below shows the cross-correlation as color-coded (blue means high negative correlation and red/brown mean high positive correlation), between any two variables e.g Average Runs Scored and Average Wickets Fallen and so on. The significance of correlation is shown in bottom panel. The white boxes are for the auto-correlation (which is One always by definition, so excluded). Both the panels are symmetric along the diagonal.
There is an interesting correlation between Average Runs scored and number of wickets fallen in an over.

Average Runs (RAvg) scored (see second column) is strongly (and significantly) positively correlated with Number of Wickets fallen (Wkts), Average Wickets per over (AWkt).
Correlations should not be confused with causality but the data indicates that Average number of runs scored is related to number of wickets fallen in an over. This is kind of a paradox to me. If a wicket falls in an over it means that there are less balls left to score. I leave it to you guys to make suggestion to resolve the paradox.




right arm over
Arvind

PS: Maybe the incoming batsmen tend to start with big-hits in IPL or in Twenty-Twenty Cricket, if wickets fall early in an over. It is also possible that the wickets fall when the batsmen is going for big-hits that means that most wickets should fall towards the end of the over.

Figure below is on suggestions from Annesh. I left out last three over from the correlation analysis. The correlation between Avg Wickets and Avg. Runs still remains high and significant.


2 comments:

Aneesh said...

Hi Arvind,
I think your second explanation is spot on. Most of the wickets are falling in the later overs, when batsmen are scoring at a high rate.

In fact, I think the wickets are falling partly *because* the batsmen are trying to score at such a high rate.

Also, the correlation coefficient is sensitive to extreme observations. The 20th over had the highest runs rate, and also, the most wickets by far. That is probably skewing the slope. Try running the same correlation excluding the 20th over, and let us know if it is still as strong a relationship.

Arvind said...

Aneesh
Thanks for the comments, I have added new figure in which I left out last 3 over from the analysis. The correlation between Avg Runs and Avg Wickets still remains. Once I will have data from Ist and II inning I will update againg.