- Dr. Larry Band (Chair), ESE
- Dr. Jon Goodall (advisor), ESE
- Dr. Venkat Lakshmi, ESE
- Dr. Julianne Quinn , ESE
- Dr. John Porter, Department of Environmental Sciences
Date: Tuesday, December 7, 2021
Time: 9:00am – 11:00am EST
Zoom Meeting Information: Email the ESE Office for the Zoom Information.
Title: Advancing the Reproducibility of Hydrologic Models: An Extensible and Collaborative Model Metadata Framework
A variety of hydrologic and environmental models are used by domain scientists to address specific challenges such as floods, droughts and water pollution. The number of scientific studies making use of such diverse and often complex models has been rapidly increasing. Ideally, the investment in models and approaches used by the existing studies can be replicated and leveraged in future studies. Many researchers in recent years have studied this idea of reproducibility as an essential part of scientific research. However, the reproducibility of scientific studies still needs to be improved given the proliferation of computational hydrologic and environmental models. The overall goal of this proposed research is to create a metadata framework to describe model programs and model instances in a more extensible and collaborative way to foster the reproducibility of model-based studies. In this research, the components of these models are defined as two distinct concepts, i.e., modeling software (called the model program) and the model inputs (called the model instance). The overarching research goal leads to three proposed studies summarized as follows.
First, an extensible environmental model metadata framework using a user-defined metadata schema will be proposed. The properties of model instance needed for a single hydrologic or environmental model program can vary significantly. This variety necessitates using a customizable metadata schema to describe model instance properties as they relate to a given model program. Therefore, a standardized, extensible and machine-readable schema to describe the metadata for the model instances of any model program is proposed.
Second, I propose to explore building model instance metadata schemas for arbitrary model programs and proposing how to manage the variety of metadata schemas that might be developed by the modeling community. To achieve this, the metadata elements that need to be standardized across hydrologic and environmental models will be identified and an example metadata schema will be built for a particular hydrologic model (i.e., Structure for Unifying Multiple Modeling Alternative (SUMMA)) use case. Upon creation of a number of different metadata schemas for different models by modelers, there will be a need to organize the metadata schemas so that they can be easily shared, found and reused. Therefore, a method for organizing the metadata schemas will be proposed.
Third, the metadata schema will be demonstrated for a use case involving data and code repositories to enable reproducibility through search, discover, reuse and version control. While data repositories can offer detailed metadata descriptions of computational models, they are not often the best place for working collaboratively on a model source code because they lack version control capabilities. However, code repositories, which offer source code version control, do not natively provide hydrologic or environmental modeling metadata to make model codes discoverable and reusable. This study will explore the integration of data and code repositories to highlight the advantages of each for increasing the reproducibility of hydrologic and environmental modeling. Across the three studies, the approaches proposed in this dissertation research seek to advance the reproducibility of computational modeling, a need not only in hydrologic and environmental fields but across scientific disciplines. The result will be a metadata framework and best practices that advance the ability to reproduce computational models.