Assistant Research Professor Florida State University
Insufficient data hamper the continuous assessment of rivers’ health in many watersheds. Transferring predictive models from gauged to poorly gauged watersheds is a promising approach in hydrology; however, the transferability of data-driven models in water quality modeling is seldom investigated. The important criteria (e.g., watersheds’ drainage area and spatiotemporal characteristics) for transferring models remain unclear. This research evaluates the influence of dominant land cover, drainage area, topographic slope, rainfall, and antecedent soil moisture of watersheds as similarity criteria for transferring machine learning (ML) models in predicting in-stream concentrations of total phosphorus, total nitrogen, fecal coliform, and dissolved oxygen. To do this, we examined 30 watersheds with various spatiotemporal characteristics within the Peace-Tampa Bay hydrologic sub-region in Florida. Our objectives were to: 1) determine if pre-trained ML models can be applied to other poorly gauged watersheds of a different spatial scale (e.g., HUC12 to HUC14); and 2) evaluates in-stream water quality data requirements (e.g., number of observations and their ranges) in poorly gauged watersheds for transferring pre-trained models from gauged watersheds. Our preliminary results suggest that satisfactory performance of re-trained ML-based models can be achieved with as few as 30 additional water quality observations in poorly gauged watersheds with similar dominant land cover, topography, and drainage areas. This study will provide insights into the scale-dependence of water quality predictions at the watershed scale and help develop scalable ML-based tools for predicting in-stream water quality constituents.