Improved datasets and approaches for modeling agricultural systems
I had the opportunity to attend two very interesting workshops related to agricultural datasets and food system modeling at Purdue University from September 10 to 12, 2014. The first workshop was entitled “Improving Geospatial Data for Decision Making and Discovery in Agriculture, Resources and the Environment” (or GEOSHARE). The GEOSHARE initiative is a web-based platform to host, disseminate and improve global datasets related to food security and nutrition modeling. The second workshop was entitled “Improving Food System Sustainability Modeling”, and focused on identifying next steps for improving models (crop and economic) that assess food and nutrition-related security and sustainability. This 2nd workshop was sponsored by CIMSANS, or the Center for Integrated Modeling of Sustainable Agriculture and Nutrition Security, and was intended to identify the need and potential for model improvement to support a ‘Sustainable Nutrition Security’ assessment – the primary effort of CIMSANS.
The mission of GEOSHARE is to “maintain a freely available, global, spatially-explicit database on agriculture, land use, and the environment through a scientific gateway optimized for workflows, visualization, and capacity building”. At its most basic level, GEOSHARE will provide a set of standardized high-quality input datasets to use in modeling studies that focus on global food security and sustainability. GEOSHARE will be hosted at Purdue University, given the long history and successful track record at this university of super-computing and hosting large open-source datasets based on the HubZero technology (http://hubzero.org/).
The intent of GEOSHARE is even more ambitious in terms of promoting transparency, reproducibility, and credibility by consensus for the generation and dissemination of large, complex agricultural datasets. For example, GEOSHARE will allow for the publication of multiple versions of the same dataset with varying levels of complexity, but also enable publication of the underlying workflow. The workflow will include any assumptions that went into the final product, as well as a step-by-step procedure, so that scientists can generate their own datasets in accordance with their needs and expert judgments. Another really cool feature of the GEOSHARE platform will be a crowd-sourcing tool so that users around the world can comment on the quality of the dataset in their particular region, and even make corrections. Finally, with appropriate “fuzzying” to address privacy and proprietary concerns, GEOSHARE could potentially incorporate valuable agricultural datasets from the private sector.
Results were presented from the initial pilot project of GEOSHARE and included examples modeling irrigation needs in Tanzania and India, assessing global land-use change, and comparing results at various aggregation levels from the Agricultural Model Inter-comparison and Improvement Project (AgMIP). While workshop participants were enthusiastic about the potential of GEOSHARE, the continued development of this initiative relies on continued funding from diverse sources and sustained interest from the global community of food security and nutrition modelers. One interesting suggestion in the discussion was to use the CG Open Data Initiative as the 1st major client of GEOSHARE, taking advantage of historical crop trial data collected by the CGIAR system, but also the in-house knowledge of where to concentrate future data collection efforts around the world. Working with the CG system will also help to make GEOSHARE a truly global initiative, perhaps by hosting hubs in developing countries where CG centers currently exist.
In the second day and a half, the CIMSANS workshop brought together an even larger range of stakeholders: from academia, the CGIAR system, the private sector, foundations and government. In this workshop, most of the meetings were held in small parallel break-out groups. The morning sessions focused on the improvement of economic models related to food security, identifying data needs and sources for crop model improvement and evaluation, the development of crop models for new crops (e.g. fruits and vegetables), and the addition of new capabilities to existing crop models (e.g. to simulate nutrients other than N and impacts of O3 pollution and pests and disease). The afternoon sessions focused on cross-model topics related to model development, such as open-access models and licensing, model coupling and design of workflow, developing common vocabularies to enable model interoperability and comparison, standards for transparency and reporting, and code modularization to support all these goals.
With concurrent sessions, I was only able to attend two sessions, i.e. those on including new capabilities in crop models and open-access models and licensing issues. For the new capabilities in crop models, the discussion and presentations focused on nutrient cycling (especially P, K & micronutrients such as zinc, iron, vitamin A and iodine), pest and disease modeling and O3 impacts on crops. There is substantial concern that the exclusive focus on crop yields in modeling studies to date has left out considerations of nutritional quality, changes in pest and disease incidence and impacts of air pollution. And while the underlying processes of each of these factors are relatively well-understood, most crop models do not currently account for any of them. This could be to a certain extent due to the complexity of their processes and lack of one-size-fits-all solutions for different crops, varieties, pest and disease organisms and environments. In particular, for pest & disease modeling, we discussed the circumstances under which a simple correlative approach with weather would suffice vs. when one might need more temporal or spatial detail to appropriately capture the dynamics, e.g. synchrony of life cycles and dispersal pathways.
For the session on open-access model development, we learned from representatives of IBM and the Blackland Research and Extension Center at Texas A & M University (developers of SWAT, EPIC and APEX) regarding their experiences in this area. It is clear that open-access model development is of ultimate benefit for the community of modelers as a whole, but challenges remain in regards to recouping return on investment for individual scientists and teams of model developers. One financially sustainable model put forth by scientists from Blackland is to make model source-code publicly available, but to charge for any additional model customization and front-end development.
The main outcome of the CIMSANS workshop was to identify a number of overall improvements needed in the food systems modeling domain, which include the addition of missing processes in models (e.g. nutrient composition of foods), improvement of existing processes (e.g. consistent demand parameter estimates in economic models), modern software engineering, improved transparency of model inputs, outputs and code, and supportive web-based infrastructure to maintain code and data and query results. The relative importance of various topics, as identified by workshop participants in pre- and post-workshop surveys, is shown in Figure 1. Although all topics were considered important, the three most important after the workshop were the improved use of breeding trial data in model development activities, transparency standards and code modularization. A number of committees were formed to follow up on the ideas discussed at the workshop, although most progress will depend largely on the existence of funding streams that can cover the cost of individual model development and cross-cutting activities.
One of the best outcomes for me personally at these workshops was the opportunity to meet so many scientists from a range of institutions who work on various topics in crop and food security modeling. I also got to meet Jim Jones (U. of Florida, former chairman of the CIAT board) and Jeff White (from USDA-ARS), two CIAT alumni who continue to work closely with DAPA researchers.