A quick search of the Internet will reveal several sites giving confident predictions for what the weather will be in the future and/or listing what the weather was at various dates in the past. However, very few of them reveal what their past predictions were and so it is difficult to find an objective measure for how much confidence you should place in a particular forecast.
I thought this would make a very interesting project for the BT Young Scientist competition for my daughter and a few of her friends. Unfortunately the judges did not agree and they rejected their entry into the competition. However, in anticipation of their project being accepted, I created a simple batch job that fetched weather forecasts from three different Internet sites each day and saved them in files for later analysis. Since the data was being collected anyway, I thought it would be a shame not to do anything with it and so I decided to a short bit of analysis which I write up here. If you don't want to bother reading all of the blog post the short summary is that the forecasts are indeed not very accurate.
When choosing the sites to use I was more influenced by how easy the data was to collect than by whether or not the source was authoritative. For example, the Met Éireann are the official forecasting service of the Irish Government, but their forecasts are deliberately translated from numerical predictions into a forecast that humans can easily understand e.g. "rain will spread from the west and become heavy by nightfall". It is very hard to do any statistical analysis on forecasts like that, so I deliberately chose three services which provided numerical forecasts in a format that was easy to parse:
- The Yahoo weather service is widely used. By fetching the contents of this URL each day I was able to retrieve an XML file with details of current weather conditions in Dublin, Ireland as well as their forecast for the weather the next 2 days.
- Weather.com is the weather service provided by the well known Weather channel and it provides weather data and predictions for all parts of the globe. By fetching this URL I got an XML file with their current weather data for Dublin as well as a prediction for the next 4 days.
- WeatherOnline is not quite so well known a weather prediction site, but they make their data very easy to retrieve. By fetching this URL, I was able to get a CSV formatted file with current weather conditions in Dublin and a forecast for the next 5 days.
The first thing I looked at was rainfall predictions. The following chart shows the predicted rainfall (on the Y-Axis in millimetres) plotted against the actual observed rainfall (on the X-Axis). If the forecast was perfect all the dots would be on a straight line with a 45 degree slope. I don't think that anyone would expect the forecast to be perfect, but I must admit that I was personally surprised at how poor this forecast is. I calculated the correlation coefficient between the forecast and actual data and it came out at 0.28 - the general rule of thumb would be to interpret such a low correlation figure as "there may be some small association between the figures". If I looked at the prediction from 5 days before rather than the prediction from the day before the correlation coefficient goes down to 0.07 - this is normally interpreted to mean that there is no association between the prediction and actual values.
Rainfall Prediction v Actual (mm) |
The next parameter I looked at was temperature. The following chart show the actual temperature plotted against the predicted temperature from the day before and from 5 days before.
I think you would agree that the temperature predictions seem to be a little better than the rain predictions and this next chart shows the predicted temperature readings from the day before (in degrees Celsius on the Y-Axis) plotted against the actual temperature on the X-Axis. This is not the straight 45 degree line we would hope for, but at least there is some association between the two. Indeed the correlation coefficient is 0.33 which is jut high enough to indicate that there is a medium strength correlation.
Predicted Temp v Actual Temp |
I have only scratched the surface of this topic. If the girls' project had been accepted they would probably have done a much more extensive analysis. Areas that would be interesting to tackle would be:
- Analysing the other factors of the prediction e.g. wind speed and direction, pressure etc.
- Looking at different weather prediction services to see if some are better than others.
- Looking at longer time scales. Because of the way I am collecting the data it is not possible to go back into the past and collect historical data, but if anyone knows of a data source showing old weather predictions I would love to analyse this.
- Looking at similar prediction accuracy in other parts of the world. For example, the weather is an extremely popular topic of conversation among Irish people, but an Egyptian colleague assures me that Egyptian people rarely discuss weather among themselves. I guess a discussion of the weather among Egyptians would quickly become boring since most days are warm and dry. Presumably weather predictions in Egypt are probably more accurate than in Ireland (but maybe nobody bothers to read them).
I have also set up a batch job to repost the updated data file each day. When I wrote this blog post I only had a few months of data to work with but if you download http://bod1.tonidoid.com/app/websharepro/share/forecast/ you will get the latest version (which is now over 1 year of data)
ReplyDelete