Looking for weather data?
Cross-posted from AgTrials.org
Being an agricultural researcher –particularly if you have ever investigated the abiotic controls on plant growth – it’s very likely that you have come across the problem of finding appropriate weather data. This is particularly true when (a) you don’t work in an institution that has its own weather station(s); (b) you’re not well connected enough to get data from close colleagues for free; (c) you don’t have enough budget to purchase the data you need from other institutions; or (d) a combination of all the above occurs. For instance, for international organisations options “a” and “c” are unfeasible because of the (potentially) large number of different datasets they might require for a given (broad) analysis, whereas for a young researcher just starting in the world of agro-climatological research and with few contacts “b” is definitely not an option. So, cases vary, but what if you are young and work in an International Organisation?
The available data:
We have realised that agricultural researchers prefer to use weather station data for their studies, as shown in Figure 1 below, and the reason for this is simple: we look for detailed and accurate data.
Nevertheless, measurements of weather for a given site are often unavailable because (1) there is no weather station; (2) weather stations are not well maintained so data are either only available for a short period or contain gaps, (3) collected data are not properly stored; (4) data do not pass basic quality checks; and/or (5) access to data is restricted by holding institutions. This all further constrains agricultural impact assessment, highlighting the importance of making data public.
In the last 10 years, however, various attempts to develop useful datasets have been done by different institutions, usually based on either a combination of weather station data, satellite data, and numerical weather prediction models in addition to interpolation methods, or on the sole application of climate models. The usage of these datasets for agricultural modelling purposes is rather limited for one or more of the following reasons:
- their time step is long (monthly in the best case);
- their temporal coverage is limited to an average of several years (see for example CRU, WorldClim);
- their spatial resolution is too coarse;
- their geographic coverage is not wide enough;
- only certain variables (i.e. temperatures, rainfall) are reported whereas other agriculturally relevant measures (e.g. potential and/or reference evapotranspiration, relative humidity, solar radiation) are rarely reported or reported at coarse spatial and/or temporal scales.
Apart from the constraints related to access and weather station locations, probably the most important issue regarding weather data is quality, which also greatly affects the performance of impact models.
So, if you’re looking for good quality, well-distributed, abundant, and freely accessible weather data, there’s no option for you. But if you’re a bit more flexible, here are your options (I will focus on rainfall, as this is the least predictable variable):
- Use GSOD (Global Summary of the Day), a daily dataset of about 9,000 weather stations worldwide. Initially this seems to be a good option, but if you look at the data, you will find some issues:
- From the potential number of stations (~9,000), to date in tropical areas, very few seem to be still reporting to NCDC (the GSOD database holder).
- It seems that in the early years (<1950) rainfall was reported by ASR33 teletype, which produced odd rainfall peaks (P. Jones, pers. comm.). Careful attention needs to be paid to these peaks, as they seem to be artificial, and need to be removed.
- As GSOD data were reported from civil aviation installations, they often misreport values of zero rain (P. Jones, pers. comm.). With this, one cannot really tell if it rained or not, hence useful information is blurred by errors.
- Use GHCN-Daily or GHCN-Monthly (Global Historical Climatology Network). GHCN is a very robust weather station dataset that was used as the basis for many studies, including the CRU datasets, WorldClim, and the famous hockey-stick warming trend analysis, that proved the anthropogenic causes of climate change.
- If it turns out that your location is not in GSOD and GHCN, what you can do is to interpolate (daily or monthly data) using different techniques (amongst which Thin Plate Splines seem to be the best method). Authors have done so in the recent literature for analysing maize yield variation in Africa, and wheat senescence in India. However, as time passes the number of weather stations has been decreasing in the global and other systems (i.e. maintenance of weather stations is expensive) –Figure 2, and this considerably affects the quality of weather stations. Our own analyses have shown considerably low interpolation technique skill when the number of weather stations is limited, and (at least for Africa) the availability of weather stations at daily time scales is highly limited.
- If you have an incomplete weather series, you could attempt to reconstruct the whole weather series using a weather generator. Weather generators are tools designed to model the “stochasticity” of rainfall (algorithms used are named Markov chains). However, beware of:
- In order to model tropical rainfall a 3rd order Markov chain is needed. All existing algorithms that can be used for fitting an individual station are based on 1st order chains, and the only algorithm based on third order chains cannot be fitted to individual weather stations.
- If you are generating weather data for more than one point you need to take into account the spatial correlation of rainfall events.
- There is a relationship between rainfall and other variables (i.e. temperatures and solar radiation), hence if additional fields are being generated, these need to be generated in a sensible way.
- If you have no data whatsoever and do not have access or the possibility to process GSOD or GHCN datasets, you could use MarkSim using only the location of your site(s). However, MarkSim fitting has been done using only 9,162 weather stations. Analysis has shown that even ~50,000 might not be enough at the global level.
- If you don’t care much about spatial resolution (but do about temporal) and require data from years >1995, you could use NASA-POWER or GPCP, but these are restricted to 1 degree spatial resolution. You could also use TRMM for years > 1998, at a higher spatial resolution (~28 km), but TRMM tends to overestimate actual rainfall (although the spatial distribution of rainfall is fairly good).
- General Circulation Models (GCMs) are currently the best way to model the complex processes that occur at the earth system’s level. However, as GCMs are highly complex, they are computationally expensive, so they have only been used for predictions at coarse spatial scales, which cause lack of skill and uncertainties. These errors and uncertainties are much stronger at high temporal scales, making GCM data of limited practical use for crop modelling.
- Regional Circulation Models (RCMs) are the best way of downscaling GCMs, but the skill of these predictions is highly dependent on the driving GCM and on the skill of the RCM. Some of our unpublished studies indicate that ETA data are useful for crop modelling purposes (i.e. at least in potato and bean –the crops we tested).
Therefore, crop modellers are challenged to gather and verify the usefulness of the available weather data for their studies, understand the broad concepts of climate modelling uncertainties and detect the sensitivities of crop models to errors in weather data, whilst also having a basic understanding of earth processes in order to identify major flaws in climate models and decide the best ways to couple them with crop models for climate change studies.
Ramirez-Villegas, J. and Challinor, A., 2012. Assessing relevant climate data for agricultural applications. Agricultural and Forest Meteorology, 161: 26-45. http://dx.doi.org/10.1016/j.agrformet.2012.03.015