Algal blooms are pervasive in many freshwater environments and can pose risks to the health and safety of humans and other organisms. However, monitoring and tracking of potentially harmful blooms often relies on in-person observations by the public. Remote sensing has proven useful in augmenting in situ observations of algal concentration, but many hurdles hinder efficient application by end users. First, numerous approaches to estimate aquatic chlorophyll-a are available and can produce inconsistent results. Second, lack of quantitative in situ observations limits opportunities to train models for specific waterbodies, such that models developed for other systems must be used instead. We (1) implement univariate and multivariate logistic regression models to estimate the probability that aquatic chlorophyll-a concentrations exceed an accepted threshold beyond which harmful effects become likely and (2) evaluate the use of visually classified bloom/no-bloom satellite imagery to augment in situ training data. Using a binary classification of aquatic chlorophyll-a exceeding 10 μg / L, we found that (1) logistic regression models were ∼80 % accurate, (2) univariate models trained with visually classified data produce nearly the same accuracy (79%) as models trained with in situ observations (80%), and (3) augmenting in situ chlorophyll-a observations with visual classifications outperformed (82% accuracy) models trained on in situ observations alone (80% accuracy). These results provide a framework for evaluating multiple spectral indices in retrieving algal bloom presence or absence and illustrate that training data derived directly from satellite imagery can be useful in augmenting in situ observations. |
1.IntroductionFreshwater algal blooms are a global concern,1–4 and there is evidence that they are becoming more common in response to climate change.1,5–7 Because algal blooms can adversely affect public health, economies, and ecosystem services by degrading water quality,8,9 early identification of algal blooms can improve public safety and mitigate economic concerns. Algal blooms are often identified via visual inspection of a waterbody,10 with reports from both waterbody managers and the public playing a fundamental role in algal bloom monitoring for state environmental monitoring agencies.11–16 Although visual inspection by water quality agencies and public health departments is a relatively accurate way to identify the presence of algal blooms,10 the number of waterbodies that can be monitored this way is limited. Further, visual inspection results can be subjective and conclusions might differ between individuals, even when optical recording devices are used.17 As a result, some public health agencies maintain reactionary stances to algal bloom monitoring, waiting for a bloom to be reported before investigating, analyzing, and providing public health guidance.18,19 This stance can result in incomplete monitoring coverage (e.g., omission of algal bloom events unless reported), and delays in public health notices that can have real-world implications on human health and socioeconomics.19 Remote sensing has the potential to augment in situ visual inspection while increasing the spatial scale of coverage. In the past 50 years, considerable research attention has been devoted to developing remote sensing techniques for identifying and tracking algal blooms.20,21 Remote sensing of water quality for inland, freshwater systems has lagged marine applications partially due to the optical complexity of inland waters.22 Despite this lag, nearly 30 years of studies have focused on the development of methods to derive water quality metrics from spectral signatures.22 In the past 20 years, a shift toward operationalizing freshwater water quality remote sensing has occurred.22,23 Identifying cyanobacterial blooms has been the focus of significant investment in remote sensing, with particular focus on the ocean and land color instrument (OLCI) on board the Sentinel-3A and Sentinel-3B satellites.24–27 By focusing on spectral features at 665 and 681 nm, this body of work relies on a well characterized two-step approach to identify the presence of phycocyanin and to then quantify the strength of the signal.24,28,29 OLCI collects imagery with a nominal 300-m ground sampling distance, allowing for the monitoring of larger waterbodies and the production of operational cyanobacterial index products at a large spatial scale. However, these products do not have sufficient spatial resolution to monitor the near-shore environment nor narrow waterbodies that are common in the intermountain west, where deep river valleys have been dammed to create reservoirs that produce hydropower and supply irrigation and drinking water. Satellite-based sensors with spatial resolution sufficient to resolve narrow waterbodies [e.g., the operational land imager (OLI) on Landsat-8 and Landsat-9, and the multispectral instrument (MSI) on Sentinel-2A and Sentinel-2B] do not have the spectral resolution required to implement the cyanobacteria index approach listed above.29,30 Instead, work with these images to identify algal conditions has focused on retrieving chlorophyll-a,31,32 which has been demonstrated to serve as a robust surrogate for cyanobacterial concentrations in conditions dominated by cyanobacteria.33 Focusing on chlorophyll-a precludes differentiation between harmful algal blooms dominated by cyanobacteria and other aquatic photosynthetic growth.28,34,35 This lack of specificity leads to a bias toward public health protection when noncyanobacterial blooms are identified. Further, the 10-m spatial resolution delivered by Sentinel-2 imagery used in this study allows waterbody managers and public health officials to monitor relatively small waterbodies, narrow portions of larger waterbodies (e.g., bays), and near-shore environments where blooms can accumulate due to wind driven transport.36,37 In this work, we evaluate the ability to classify chlorophyll concentrations using higher spatial- but lower spectral- and temporal-resolution imagery from the MSI on board the Sentinel-2A and Sentinel-2B satellites. Multiple spectral indices have been developed to retrieve chlorophyll-a conditions from a range of passive optical sensors and are presented in the literature.30,38–46 However, none of these approaches have been shown to consistently outperform the others in retrieving chlorophyll-a concentrations. Additionally, we typically lack water quality observations for any given waterbody that are coincident with satellite imagery despite large-scale projects to compile such matchups.47 As such, two distinct challenges must be addressed when using satellite imagery to estimate water quality: (1) identifying spectral indices that describe water quality metrics of interest and (2) relating these spectral indices to water quality metrics in the absence of in situ observations. First, we hypothesize that incorporating multiple spectral indices will describe water quality more robustly than selecting a single spectral index. We test this hypothesis by evaluating the accuracy of single variate logistic regression models for each spectral index against multivariate logistic regression models that incorporate multiple spectral indices. Second, we hypothesize that algal blooms can be identified directly from true color composite satellite imagery, obviating the need for in situ observations. We test this hypothesis by training univariate and multivariate logistic regression models of algal bloom presence with bloom observations identified via visual interpretation of satellite imagery. We evaluate the performance of the logistic regression model calibrated with the visual interpretation calibration dataset relative to those calibrated with in situ samples to determine the efficacy of generating training data from satellite imagery. The work presented here differs from previous efforts by combining bloom presence and absence data with logistic regression models to produce bloom presence probabilities from multivariate models. 2.Methods2.1.Study SiteThis work was conducted in Brownlee Reservoir, located on the Idaho-Oregon border (Fig. 1). It is the largest reservoir in the Hells Canyon Complex of hydroelectric reservoirs at in surface area, 93 km in length, and in volume, with a maximum depth of nearly 100 m near the dam.50 The reservoir is wide, on average, and is surrounded by hills with 20% to 30% slopes. Brownlee Reservoir has designated beneficial uses of cold-water aquatic life, primary contact recreation, domestic water supply, industrial water supply, irrigation water, livestock watering, salmonid rearing and spawning, resident fish and aquatic life, wildlife and hunting, fishing, boating, aesthetics, and hydropower.51 Brownlee Reservoir is listed as impaired for excess nutrients associated with nuisance algae growth and has a history of cyanobacteria blooms.51 The reservoir is an active recreation destination with nights of camping along the shore of the reservoir in 2013.11 Additionally, discharge from Brownlee Reservoir flows into the Hells Canyon National Recreational Area, which has been estimated to have more than 50,000 boaters visit per year making it a significant economic resource where the populace can be impacted by water quality.50 2.2.Field Data CollectionWater samples were collected by Idaho Power Company personnel from Brownlee Reservoir. Samples were collected from predetermined locations within the reservoir with known coordinates to match sample collection locations with pixels in the associated satellite imagery. Samples were collected within 2 m of the surface, immediately placed on ice, and delivered to the analysis laboratory within 24 h. Samples were spectrophotometrically analyzed for total chlorophyll-a, corrected for pheophytin following standard method 10200H.2.52 Only results from samples collected on the same date as Sentinel-2 satellite imagery were included in this analysis. The World Health Organization has identified chlorophyll-a concentrations exceeding to be associated with a transition from slight to moderate risk of adverse health effects from primary contact in cases where Microcystis dominates the chlorophyll-a concentration.53 Although the dominant taxa are not identified in this work, “bloom” and “bloom conditions” are defined in this work to represent chlorophyll-a concentrations greater than or equal to . 2.3.Visual Bloom IdentificationTo evaluate the efficacy of developing training datasets directly from satellite imagery, points representing the distinct presence or absence of an algal bloom were visually interpreted and digitized (Fig. 2) from a series of 26 Sentinel-2 satellite images obtained from the Copernicus Application Programming Interface.48,49 Digitization was conducted in the Geographic Information System ArcMAP 10.8.1 from Environmental Systems Research Institute, Inc. (Redlands, California) where true-color (red, band 4; green, band 3; and blue, band 2) Sentinel-2 images were displayed. Minimum and maximum values in the visualization were set to the equivalent to 0% and 100% reflectance, respectively. Locations associated with algal blooms were visually identified as pixels with elevated reflectance in the green band arranged in continuous shapes associated with algal blooms [Figs. 2(b) and 2(d)]. Points representing no-bloom conditions were identified based on low reflectance in the red, green, and blue bands to provide class balance in training data [Figs. 2(c) and 2(e)]. Bloom and no-bloom conditions were assigned without knowledge of in situ observations to reduce identification bias. Incorporating these data in the evaluation of spectral indices leverages the information that is readily available within historic satellite imagery via conventional image interpretation and is similar to approaches used to develop training data for pixel-based supervised image classification of land cover.54,55 2.4.Satellite ImageryLevel 1C top of atmosphere imagery collected with the multispectral instrument (MSI) sensors on the Sentinel-2A and Sentinel-2B satellites for tile 11TMK was obtained from the European Space Agency through the Copernicus Application Programming Interface.48 Top of atmosphere imagery was atmospherically corrected using the dark spectrum fitting algorithm approach implemented in the Atmospheric Correction for OLI ‘lite’ generic processor version (v20190326.0) to produce aquatic reflectance products.56 See Ref. 57 for a full description of the dark spectrum fitting approach. Default settings were used in the atmospheric correction with the exception that waterbody elevation was set to 610 m above sea level to account for atmospheric path length. At each location where an in situ or visually identified observation was made, aquatic reflectance values for each band were extracted from all pixels with centers within 50 m of the observation’s location.49 A 50-m buffer was used to spatially smooth reflectance values and to account for potential positional error in sample collection location. The median reflectance values within the 50-m buffer were used to represent each band’s value at the specified location. The median statistic was used rather than the mean to reduce the impact of outliers on the resulting aquatic reflectance values. 2.5.Spectral Index EvaluationSeventeen spectral indices that were expected to be sensitive to chlorophyll-a concentrations were selected from the literature and evaluated30,38–46 (Table 1). Spectral indices developed for sensors other than the MSI sensor used in this work were selected if the central wavelengths of all bands used in index development [i.e., bands from OLI or the medium resolution imaging spectrometer (MERIS)] fell within unique MSI bands. MSI bands were defined for this work by their central wavelengths and full-width half-maximums.60 Table 1Sentienl-2 spectral indices evaluated.
2.6.Binary Logistic RegressionThe probability that an algal bloom was present for each pixel in a Sentinel-2 image was determined by relating the presence (chlorophyll-a concentration ) or absence (chlorophyll-a concentration ) of an algal bloom to the value for one or more spectral indices using a binary logistic regression approach. Binary logistic regression was implemented as follows: where is the probability that chlorophyll-a concentration exceeded , is an intercept calibration term, and through are the parameter effects for spectral indices through . To address class imbalances in calibration data (i.e., more observations of bloom versus nonbloom conditions), weights were applied to the observations as where and are the weights for and number of bloom condition observations, respectively, and and are the weights for and number of nonbloom condition observations, respectively.Univariate and multivariate logistic regression models were developed to assess performance of individual spectral indices and combinations of spectral indices to identify algal blooms. Additionally, logistic regression models were trained and tested with different combinations of in situ and visually identified observations to evaluate the impact of different training data sources on model performance (Table 2). Table 2Logistic model regression scenarios for the univariate and multivariate models.
The “gaged” calibration scenario reflects a widely used approach to model calibration using in situ observations.30 The “ungaged” scenario evaluates the efficacy of training a model based on visually identified bloom occurrences when in situ observations are too sparse or not available. The “augmented” scenario evaluates the utility of augmenting in situ observations with visually identified blooms. Each modeling scenario and the associated training and testing data are described in the following sections. Performance between and among univariate and multivariate models was evaluated using the accuracy metrics described at the end of this section. All analyses were conducted in version 3.6.0 of the R statistical programming language 61 using RStudio v1.2.1335.62 2.6.1.Univariate logistic regression modelsLogistic regression models were developed for each of the 17 spectral indices listed in Table 1 and each of the calibration scenarios listed in Table 2. Performance of the resulting univariate models was quantified by assessing the accuracy of each individual index in identifying algal blooms; these results provided benchmarks to compare multivariate models. 2.6.2.Multivariate logistic regression modelsMultivariate logistic regressions were produced to test the hypothesis that classifications based on multiple spectral indices are more robust than classifications from single spectral indices. Three multivariate logistic regression models were developed, one for each of the gaged, ungaged, and augmented scenarios in Table 2, to assess how in situ and visually identified training data affect accuracy of algal bloom identification from Sentinel-2 imagery. Multivariate logistic regressions were produced from the spectral indices listed in Table 1 using a three-step approach. First, highly correlated spectral indices were identified based on their variance inflation factor (VIF) values and removed one at a time to achieve a subset of spectral indices where the VIF for each index was .63,64 This was done by removing the index with the highest VIF, recomputing VIF for all remaining indices and removing the subsequent index with the highest VIF. This process was repeated until no indices had VIF values above ten. Second, the scenario-specific training dataset identified in Table 2 was selected. Third, multivariate logistic regressions were calibrated using all spectral indices identified through the VIF-based variable selection process. During the calibration procedure, parsimonious multivariate models were identified using stepwise variate selection with the objective of minimizing the Akaike information criterion (AIC).65 This procedure was repeated for all three calibration scenarios in Table 2. 2.7.Accuracy AssessmentThe accuracy of the logistic regression models was evaluated using a 10-fold cross validation approach with an 80% calibration, 20% validation split (Table 2). For each iteration, 80% of the in situ data were randomly selected as the training dataset, and the remaining 20% were used to test model accuracy. Performance was evaluated using four metrics: precision, recall, score, and overall accuracy. Precision is a measure of how many of a model’s positive predictions (e.g., above threshold) were correct [Eq. (4)], whereas recall measures how many of the positive observations were identified as such in the model [Eq. (5)]. These are given as where #TP is the number of true positives, #FP is the number of false positives, and #FN is the number of false negatives. The statistic was used as a multiple-criterion metric to evaluate the performance of logistic regressions that accounts for the trade-off between precision and recall.66 The statistic was computed asAccuracy, defined here as the percent of observations that were correctly classified, was calculated to provide a more intuitive and familiar evaluation of model performance. Accuracy was calculated as the number of true positive and true negative results divided by total number of observations in the validation dataset.67 An exceedance probability of 50% (0.5) was used to classify model output as exceeding . Figure 3 provides a graphical example of the four possible outcomes, true positive, true negetive, false positive, and false negative, for each validation data point relative to the 50% and thresholds. 3.Results3.1.Field ObservationsTwenty-four in situ observations from 15 sites along Brownlee Reservoir were used in the analysis (Fig. 1). Chlorophyll-a concentrations in these samples ranged from 1.2 to with a median value of . There were 10 observations (42%) with concentrations of or higher, indicating relative parity in observations above and below the threshold. An additional 195 points were manually digitized from 26 Sentinel-2 images (Fig. 1). Of the manually digitized points, 109 (56%) were classified as blooming conditions. Data are available in Ref. 49. Reflectance spectra from extract from imagery at bloom locations showed elevated reflectance in bands three () and five () for both in situ and visually identified bloom locations (Fig. 4). The reflectance values were similar between visually identified and in situ observations for the nonbloom conditions while reflectance values were higher for bands 3 (), 5 (), 6 (), 7 (), 8 (), and 8a () for the visually identified data under bloom conditions than for the in situ data. 3.2.Univariate Model PerformanceWith the gaged calibration approach, the relationship between four spectral indices (S02, S08, S10, and S13) and chlorophyll concentration exceeding were statistically significant (). Of these four models, the univariate models based on S10 and S13 had the highest classification accuracies of 80% and scores of 0.74 (Table 3). The univariate models established with the gaged calibration approach that used indices S09 and S16 were the highest performing with clear separation in exceedance probability between concentrations above and below the threshold (Fig. 5) but were not found to be statistically significant (i.e., the model term had ). Misclassified observations for models based on S10 and S13 had concentrations within of the threshold on average illustrating that for the best performing models, cases of misclassification were limited to conditions near the threshold (Fig. 5). Table 3Univariate model performance using the gaged calibration approach.
When training with the visually identified dataset and testing on the in situ observations in the ungaged approach, all spectral indices, except those based on S06 and S17, had statistically significant relationships () with the probability of bloom occurrence. Of these models, those based on S08 and S14 were the highest performers with accuracy rates of 79% and scores of 0.67 (Table 4). However, separation in exceedance probabilities across concentrations was less clear (Fig. 6) when compared to the gaged calibration approach (Fig. 5). Misclassified observations for models based on S08 and S14 had concentrations within of the threshold on average, suggesting that cases of misclassification were limited to conditions near the threshold for the best-performing models (Fig. 6). Table 4Univariate model performance using the Ungaged Calibration Approach.
When in situ observations are augmented with visually identified observations in the augmented calibration approach, all models except those based on S06 and S17 had statistically significant relationships with the probability of bloom occurrence (). Models based on S05, S08, and S14 had the highest scores (0.58) and the highest accuracy (74%, Table 5). Accuracy and values for these highly correlated indices were lower than for the top performing models under the gaged and ungaged calibration approaches because of decreases in precision driven by an increase in false negatives (Fig. 7). Misclassified observations for models based on S05, S08, and S14 had concentrations within of the threshold on average, indicating that for the best performing models in the augmented calibration approach the cases of misclassification are limited to conditions near the threshold (Fig. 7). Table 5Univariate model performance using the augmented calibration approach.
3.3.Multivariate Model PerformanceOf the 17 spectral indices examined for Sentinel-2 (Table 1), S01, S03, S04, S05, S09, S10, S11, S12, S14, S16, and S17 were found to be the most highly correlated with other indices (Fig. 8) and were removed in the stepwise, VIF-based variable selection process. The remaining six indices, S02, S06, S07, S08, S11, and S15, had VIF values at the end of the stepwise removal process and were selected for evaluation in the multivariate regression approach. The best performing multivariate models for the gaged (), ungaged (), and augmented () model calibration approaches had accuracies of 0.80, 0.79, and 0.82, respectively (Table 6). For the multivariate models, the augmented calibration approach also had the highest statistic (0.73), although it is rather similar to the score of 0.72 for the gaged calibration approach. Misclassified observations for the gaged, ungaged, and augmented multivariate models had concentrations within of the threshold, on average, suggesting that for all multivariate models the cases of misclassification are limited to conditions near the threshold (Fig. 9). Table 6Performance of the multivariate and top performing univariate models.
MG, MU, and MA refer to top performing multivariate models calibrated with the gaged, ungaged, and augmented approaches, respectively. The spectral indices included in the best performing multivariate models varied by calibration scenario (Table 7). The multivariate model calibrated with the gaged approach () selected two model members. The stepwise parameter selection process for the ungaged multivariate model calibration approach () resulted in a univariate model (S08), as a balance between model parsimony and maximum model likelihood. The multivariate model calibrated with the augmented dataset incorporated all potential spectral indices except for S02 and S06. In all cases, the models with the lowest AIC also had the highest scores. Table 7Multivariate model parameters for each calibration approach. 4.DiscussionWe developed models that can be applied to identify algal blooms from satellite imagery by evaluating different data sources describing the presence and absence of algal bloom conditions against multiple spectral indices designed to identify chlorophyll presence. 4.1.Correlation of Spectral IndicesAlthough 17 spectral indices were identified in the literature and evaluated in this work, many were found to be highly correlated. Through an iterative index removal process, 11 indices were removed before the remaining indices had VIFs . This result indicates that only six of the evaluated 17 spectral indices are required to represent the observed variability in chlorophyll-a concentrations. Reducing the search space by more than 60% is valuable as it reduces the number of indices that require evaluation. 4.2.Spectral IndicesAs expected, spectral indices developed for and evaluated on MSI imagery outperformed those developed for other sensors when used in isolation in the univariate models calibrated with in situ observations (Tables 1 and 3). Specifically, the top-performing univariate models with the gaged calibration approaches, S10 and S13, were developed for the MSI.30,43 They also both focus on band 5 (704 nm) relative to band 4 (665 nm) normalized by bands 6 (740 nm) or 8a (865 nm) thus illustrating the importance of the “red edge” and red bands for retrieving a chlorophyll signal in agreement with previous work.68 However, indices developed for other sensors joined the top performers when model calibration included visually identified data and in multivariate models. Specifically, S08, developed for OLI,30 and S14, developed for the MSI, were the top performing univariate models in the ungaged calibration scenario (Table 4). Indices S08 and S14 focus on the reflectance peak for band 3 (560 nm), illustrating the influence of “green” light on the identification of algal blooms when using RGB color composites to identify algal blooms. Index S05, developed for MERIS58 and focused on the “red edge,” joined S08 and S14 as a top performer for the augmented calibration scenario (Table 5). The improved performance from the augmented calibration scenarios (Table 6) highlights the value of using visible, spatial, and infrared cues to identify algal blooms. 4.3.Univariate versus Multivariate ResultsThe multivariate model performed just as well as all the statistically significant univariate models for the gaged and ungaged calibration scenarios. The multivariate model under the augmented calibration scenario was the highest performing of the statistically significant models overall. This increase in the performance could be due to the incorporation of multiple spectral features present in algal blooms (Fig. 4), as the ensemble model calibrated with in situ data augmented with spectra extracted from satellite imagery focused on bands 2–5 and 8a (Table 1). This result is similar to previous studies69,70 and is consistent with our hypothesis “incorporating multiple spectral indices is more robust than selecting a single spectral index.” The improvement in accuracy is attributable to an increase in precision associated with a reduction in false positives as well as an increase in recall. These results suggest that the multivariate models were more skilled in identifying observed bloom conditions (Table 6). 4.4.Incorporating Image Derived Training DataThe univariate and multivariate models trained on visually identified training data alone were nearly as accurate (79% accuracy) as training based on in situ observations (80% accuracy). This is a remarkable finding because it implies that training datasets can be built for waterbodies lacking in situ data by extracting the necessary information from the satellite images themselves. Further, the multivariate approach calibrated on the augmented observations provided the highest accuracy overall with a mean accuracy of 82% indicating a benefit of including visually identified end-member spectra even in cases where in situ data are available. The multivariate model calibrated on visually identified data () had near perfect model recall, meaning that nearly all the observed bloom conditions in the in situ observation dataset were identified in the resulting model. However, this same model had relatively low precision due to the presence of numerous false positives. The high recall and low precision indicate that classification with visually identified data is best suited to cases where decision makers tend to be more tolerant of false positives than false negatives. Notably, the probability (50%) and concentration () thresholds can, and likely should, be adjusted in this approach to fit end-user communication and reporting needs. In fact, it can be seen in Fig. 9(c) that selecting a slightly higher chlorophyll-a threshold () would result in perfect classification. Figure 4 shows that the visually identified bloom locations had higher NIR reflectance than pixels identified as bloom conditions via in situ observations. This may reflect a bias in the visual interpretation toward identifying floating algae that would have higher NIR reflectance than submerged algae. Further, a robust analysis of the consistency and repeatability of manually classified training data in Brownlee and other waterbodies could improve classification. The ungaged model results indicate that the use of image-derived spectra for training models could be useful in cases where in situ observations are limited. The reasonable accuracy obtained with the ungaged multivariate calibration (79% for ), and the increased accuracy of the augmented multivariate () relative to the gaged multivariate () is consistent with our hypothesis “satellite imagery itself contains information useful for evaluating spectral indices.” 4.5.Spatial Patterns in Model ResultsIn addition to the correct identification of conditions at observation locations, the spatial patterns of model results can be examined qualitatively to confirm agreement with features visible in satellite imagery. In Fig. 10(a), an algal bloom is clearly seen in the true color composite. A sample collected from within this feature had chlorophyll-a concentration of , verifying the feature as an algal bloom. The ribbon-like features of the algal bloom are well described by some of the models (Fig. 10). However, some univariate models do not appear to be sensitive to the presence of the algal mass, returning nearly uniform exceedance probabilities for all pixels in the image. Although this is not a quantitative assessment, examining the models’ abilities to reproduce spatial patterns of algal blooms provides insight into an index’s general performance. 4.6.Sources of UncertaintyThe approach taken here is subject to multiple sources of uncertainty, including but not necessarily limited to the atmospheric correction procedure, interfering effects of sediment and other nonchlorophyll-a containing substances on the chlorophyll-a signal, the presence of nonalgal plants (e.g., submerged aquatic vegetation of sloughed macrophyte mats) obfuscating interpretation of the chlorophyll-a signal as an algal bloom, error rates associated with the visual identification process, the effects of wind-driven sun glint, the use of chlorophyll-a that is not corrected for degradation byproducts like pheophytin, adjacency effects, bottom reflectance, and potential temporal and spatial mismatch between in situ observations and extracted aquatic reflectance values. The limited number of in situ observations likely also contributed to calibration uncertainty, exemplifying the very common challenge of calibrating semiempirical approaches with limited data. Notably, the ungaged calibration approach removes the uncertainty associated with temporal and spatial mismatch as the signals are derived from imagery directly. This, in addition to a larger validation dataset, may have contributed to more univariate models with statistically significant calibrations under the ungaged approach relative to the gaged approach. Despite many potential sources of error, the achieved accuracies of 80% and higher indicate that the algal bloom signal is large in comparison with the noise associated with all these potential sources of uncertainty. The encouraging results reported herein notwithstanding, addressing each of these potential sources of uncertainty could improve model accuracy. 4.7.Future ApplicationsOur intent in introducing this approach is to provide an additional tool for public health and natural resource managers to identify potentially harmful conditions that warrant in situ monitoring. Providing timely situational awareness of algal bloom extent has the potential to increase resource efficiency by guiding field staff to priority sampling locations. These methods also afford the potential to identify nascent blooms in remote areas before they would be identified otherwise. Finally, historic satellite imagery contains information on algal bloom dynamics. Reanalysis of these images could provide information on spatial and temporal trends that might yield insight regarding potential drivers of algal blooms. 5.ConclusionMultivariate models were as accurate as univariate indices in classifying aquatic chlorophyll-a relative to a threshold. Manually digitized observations of end-member conditions (e.g., bloom and nonbloom) were used to calibrate aquatic chlorophyll-a retrieval in the absence of in situ observations with reasonable accuracy (79%) that is nearly equal to that of using in situ observations only (80%). Augmenting in situ observations with manually digitized observations of end-member conditions (e.g., bloom and nonbloom) improved remote sensing accuracy to 82%. These results suggest that image interpretation might be suitable for deriving training data for algal bloom classification in the absence of or to augment in situ observations matched with Sentinel-2 satellite imagery. AcknowledgementsThis research was supported by Idaho Power Company. The authors would like to extend gratitude to Brian Hoelscher and Nick Gastelecutto at Idaho Power Company for support in data collection. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. government. The Article was prepared solely by employees of the United States federal government as part of the employees' official duties. It is an official U.S. government publication, and is not subject to copyright protection within the United States. ReferencesJ. C. Ho, A. M. Michalak and N. Pahlevan,
“Widespread global increase in intense lake phytoplankton blooms since the 1980s,”
Nature, 574
(7780), 667
–670 https://doi.org/10.1038/s41586-019-1648-7
(2019).
Google Scholar
Z. Namsaraev et al.,
“Algal bloom occurrence and effects in Russia,”
Water, 12
(1), 285 https://doi.org/10.3390/w12010285
(2020).
Google Scholar
L. L. Ndlela et al.,
“An overview of cyanobacterial bloom occurrences and research in Africa over the last decade,”
Harmful Algae, 60 11
–26 https://doi.org/10.1016/j.hal.2016.10.001 HALNE7
(2016).
Google Scholar
F. R. Pick,
“Blooming algae: a Canadian perspective on the rise of toxic cyanobacteria,”
Can. J. Fish. Aquat. Sci., 73
(7), 1149
–1158 https://doi.org/10.1139/cjfas-2015-0470 CJFSDX 1205-7533
(2016).
Google Scholar
C. J. Gobler,
“Climate change and harmful algal blooms: insights and perspective,”
Harmful Algae, 91 101731 https://doi.org/10.1016/j.hal.2019.101731 HALNE7
(2020).
Google Scholar
H. W. Paerl and J. Huisman,
“CLIMATE: blooms like it hot,”
Science, 320
(5872), 57
–58 https://doi.org/10.1126/science.1155398 SCIEAS 0036-8075
(2008).
Google Scholar
H. W. Paerl and J. Huisman,
“Climate change: a catalyst for global expansion of harmful cyanobacterial blooms,”
Environ. Microbiol. Rep., 1
(1), 27
–37 https://doi.org/10.1111/j.1758-2229.2008.00004.x
(2009).
Google Scholar
C. B. Lopez et al., Scientific Assessment of Freshwater Harmful Algal Blooms, Interagency Working Group on Harmful Algal Blooms, Hypoxia, and Human Health of the Joint Subcommittee on Ocean Science and Technology, Washington, DC
(2008). Google Scholar
USEPA, Nutrient Criteria Technical Guidance Manual: Lakes and Reservoirs EPA 822-B00-001, United States Environmental Protection Agency, Office of Water, Washington, DC
(2000). Google Scholar
USEPA, Recommendations for Cyanobacteria and Cyanotoxin Monitoring in Recreational Waters: EPA 823-R-19-001, Washington, DC
(2019). Google Scholar
Idaho Power Company, Section 401 Water-Quality Certification Application Hells Canyon Complex FERC No. 1971, Idaho Power Company(
(2018). Google Scholar
Montana DEQ, Harmful Algal Bloom (HAB) Guidance Document for Montana, Montana Department of Environmental Quality(
(2021). Google Scholar
New Hampshire DES, New Hampshire Department of Environmental Services CyanoHAB Response Protocol for Public Water Supplies, New Hampshire Department of Environmental Services(
(2020). Google Scholar
New York DEC, Harmful Algal Bloom Action Plan Skaneateles Lake, New York Department of Environmental Conservation(
(2022). Google Scholar
Ohio EPA, Public Water System Harmful Algal Bloom Response Strategy, Ohio Environmental Protection Agency(
(2014). Google Scholar
Oregon Health Authority, Oregon Harmful Algae Bloom Surveillance (HABS) Program Recreational Use Public Health Advisory Guidelines Cyanobacterial Blooms in Freshwater Bodies, Oregon Health Authority Public Health Division Center for Health Protection(
(2019). Google Scholar
T. J. Malthus, R. Ohmsen and H. J. van der Woerd,
“An evaluation of citizen science smartphone apps for inland water quality assessment,”
Remote Sens., 12
(10), 1578 https://doi.org/10.3390/rs12101578
(2020).
Google Scholar
H. Rashidi et al.,
“Monitoring, managing, and communicating risk of Harmful Algal Blooms (HABs) in recreational resources across Canada,”
Environ. Health Insights, 15 117863022110144 https://doi.org/10.1177/11786302211014401
(2021).
Google Scholar
S. Stroming et al.,
“Quantifying the human health benefits of using satellite information to detect cyanobacterial harmful algal blooms and manage recreational advisories in U.S. Lakes,”
GeoHealth, 4
(9), e2020GH000254 https://doi.org/10.1029/2020GH000254
(2020).
Google Scholar
M. Gholizadeh, A. Melesse and L. Reddi,
“A comprehensive review on water quality parameters estimation using remote sensing techniques,”
Sensors, 16
(8), 1298 https://doi.org/10.3390/s16081298 SNSRES 0746-9462
(2016).
Google Scholar
R. M. Khan et al.,
“A meta-analysis on harmful algal bloom (HAB) detection and monitoring: a remote sensing perspective,”
Remote Sens., 13
(21), 4347 https://doi.org/10.3390/rs13214347
(2021).
Google Scholar
S. N. Topp et al.,
“Research trends in the use of remote sensing for inland water quality science: moving towards multidisciplinary applications,”
Water, 12
(1), 169 https://doi.org/10.3390/w12010169
(2020).
Google Scholar
B. A. Schaeffer et al.,
“Mobile device application for monitoring cyanobacteria harmful algal blooms using Sentinel-3 Satellite Ocean and land colour instruments,”
Environ. Modell. Software, 109 93
–103 https://doi.org/10.1016/j.envsoft.2018.08.015
(2018).
Google Scholar
M. M. Coffer et al.,
“Quantifying national and regional cyanobacterial occurrence in US lakes using satellite remote sensing,”
Ecol. Indic., 111 105976 https://doi.org/10.1016/j.ecolind.2019.105976
(2020).
Google Scholar
S. Mishra et al.,
“Measurement of cyanobacterial bloom magnitude using satellite remote sensing,”
Sci. Rep., 9
(1), 18310 https://doi.org/10.1038/s41598-019-54453-y SRCEC3 2045-2322
(2019).
Google Scholar
N. Pahlevan, S. Ackleson and B. Shaeffer,
“Toward a satellite-based monitoring system for water quality,”
EOS, 99 https://doi.org/10.1029/2018EO093913 EOSMAW
(2018).
Google Scholar
P. Whitman et al.,
“A validation of satellite derived cyanobacteria detections with state reported events and recreation advisories across U.S. lakes,”
Harmful Algae, 115 102191 https://doi.org/10.1016/j.hal.2022.102191 HALNE7
(2022).
Google Scholar
R. S. Lunetta et al.,
“Evaluation of cyanobacteria cell count detection derived from MERIS imagery across the eastern USA,”
Remote Sens. Environ., 157 24
–34 https://doi.org/10.1016/j.rse.2014.06.008
(2015).
Google Scholar
M. W. Matthews, S. Bernard and L. Robertson,
“An algorithm for detecting trophic status (chlorophyll-a), cyanobacterial-dominance, surface scums and floating vegetation in inland and coastal waters,”
Remote Sens. Environ., 124 637
–652 https://doi.org/10.1016/j.rse.2012.05.032
(2012).
Google Scholar
R. Beck et al.,
“Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations,”
Remote Sens. Environ., 178 15
–30 https://doi.org/10.1016/j.rse.2016.03.002
(2016).
Google Scholar
M. G. Allan et al.,
“Empirical and semi-analytical chlorophyll a algorithms for multi-temporal monitoring of New Zealand lakes using Landsat,”
Environ. Monit. Assess., 187
(6), 364 https://doi.org/10.1007/s10661-015-4585-4 EMASDH 0167-6369
(2015).
Google Scholar
F. Watanabe et al.,
“Estimation of chlorophyll-a concentration and the trophic state of the Barra Bonita hydroelectric reservoir using OLI/Landsat-8 images,”
IJERPH, 12
(9), 10391
–10417 https://doi.org/10.3390/ijerph120910391
(2015).
Google Scholar
R. P. Stumpf et al.,
“Challenges for mapping cyanotoxin patterns from remote sensing of cyanobacteria,”
Harmful Algae, 54 160
–173 https://doi.org/10.1016/j.hal.2016.01.005 HALNE7
(2016).
Google Scholar
S. G. H. Simis, S. W. M. Peters and H. J. Gons,
“Remote sensing of the cyanobacterial pigment phycocyanin in turbid inland water,”
Limnol. Oceanogr., 50
(1), 237
–245 https://doi.org/10.4319/lo.2005.50.1.0237 LIOCAH 0024-3590
(2005).
Google Scholar
T. T. Wynne et al.,
“Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes,”
Int. J. Remote Sens., 29
(12), 3665
–3672 https://doi.org/10.1080/01431160802007640 IJSEDK 0143-1161
(2008).
Google Scholar
R. P. Stumpf et al.,
“Hydrodynamic accumulation of Karenia off the west coast of Florida,”
Cont. Shelf Res., 28
(1), 189
–213 https://doi.org/10.1016/j.csr.2007.04.017 CSHRDZ 0278-4343
(2008).
Google Scholar
Y. Zhang et al.,
“A view of physical mechanisms for transporting harmful algal blooms to Massachusetts Bay,”
Mar. Pollut. Bull., 154 111048 https://doi.org/10.1016/j.marpolbul.2020.111048 MPNBAZ 0025-326X
(2020).
Google Scholar
F. Alawadi,
“Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI),”
Proc. SPIE, 7825 782506 https://doi.org/10.1117/12.862096 PSISDG 0277-786X
(2010).
Google Scholar
R. Beck et al.,
“Comparison of satellite reflectance algorithms for estimating phycocyanin values and cyanobacterial total biovolume in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations,”
Remote Sens., 9
(6), 538 https://doi.org/10.3390/rs9060538
(2017).
Google Scholar
H. J. Gons,
“A chlorophyll-retrieval algorithm for satellite imagery (medium resolution imaging spectrometer) of inland and coastal waters,”
J. Plankton Res., 24
(9), 947
–951 https://doi.org/10.1093/plankt/24.9.947 JPLRD9 0142-7873
(2002).
Google Scholar
C. Le et al.,
“Evaluation of chlorophyll-a remote sensing algorithms for an optically complex estuary,”
Remote Sens. Environ., 129 75
–89 https://doi.org/10.1016/j.rse.2012.11.001
(2013).
Google Scholar
S. Mishra and D. R. Mishra,
“Normalized difference chlorophyll index: a novel model for remote estimation of chlorophyll-a concentration in turbid productive waters,”
Remote Sens. Environ., 117 394
–406 https://doi.org/10.1016/j.rse.2011.10.016
(2012).
Google Scholar
W. J. Moses et al.,
“Operational MERIS-based NIR-red algorithms for estimating chlorophyll-a concentrations in coastal waters—the Azov Sea case study,”
Remote Sens. Environ., 121 118
–124 https://doi.org/10.1016/j.rse.2012.01.024
(2012).
Google Scholar
J. E. O’Reilly et al.,
“Ocean color chlorophyll algorithms for SeaWiFS,”
J. Geophys. Res., 103
(C11), 24937
–24953 https://doi.org/10.1029/98JC02160 JGREA2 0148-0227
(1998).
Google Scholar
E. J. Tebbs, J. J. Remedios and D. M. Harper,
“Remote sensing of chlorophyll-a as a measure of cyanobacterial biomass in Lake Bogoria, a hypertrophic, saline–alkaline, flamingo lake, using Landsat ETM+,”
Remote Sens. Environ., 135 92
–106 https://doi.org/10.1016/j.rse.2013.03.024
(2013).
Google Scholar
K. Toming et al.,
“First experiences in mapping lake water quality parameters with Sentinel-2 MSI imagery,”
Remote Sens., 8
(8), 640 https://doi.org/10.3390/rs8080640
(2016).
Google Scholar
M. R. V. Ross et al.,
“AquaSat: a data set to enable remote sensing of water quality for inland waters,”
Water Resour. Res., 55
(11), 10012
–10025 https://doi.org/10.1029/2019WR024883 WRERAQ 0043-1397
(2019).
Google Scholar
European Space Agency (ESA),
“Copernicus Open Access Hub,”
(2022). https://scihub.copernicus.eu/ Google Scholar
T. V. King and K. C. Hafen,
“Chlorophyll-a concentrations and algal bloom condition paired with Sentinel-2 aquatic reflectance values collected for Brownlee Reservoir, ID from 2015 through 2020,”
U.S. Geological Survey(
(2022). Google Scholar
Idaho Power Company, Recreational Use Associated with the Snake River in the Hells Canyon National Recreation Area, Idaho Power Company(
(2002). Google Scholar
IDEQ and ODEQ, Snake River—Hells Canyon Total Maximum Daily Load (TMDL), 710 Idaho Department of Environmental Quality, Boise, Idaho
(2004). Google Scholar
APHA, Standard Methods for the Examination of Water and Wastewater, American Public Health Association, Washington DC
(1999). Google Scholar
V. Romano Spica et al., Guidelines for Safe Recreational Water Environments. Volume 1, Coastal and Fresh Water, Antonio Delfino Editore(
(2010). Google Scholar
R. Khatami, G. Mountrakis and S. V. Stehman,
“A meta-analysis of remote sensing research on supervised pixel-based land-cover image classification processes: general guidelines for practitioners and future research,”
Remote Sens. Environ., 177 89
–100 https://doi.org/10.1016/j.rse.2016.02.028
(2016).
Google Scholar
J. A. Richards,
“Supervised classification techniques,”
Remote Sensing Digital Image Analysis, 247
–318 Springer Berlin Heidelberg, Berlin, Heidelberg
(2013). Google Scholar
Q. Vanhellemont,
“Adaptation of the dark spectrum fitting atmospheric correction for aquatic applications of the Landsat and Sentinel-2 archives,”
Remote Sens. Environ., 225 175
–192 https://doi.org/10.1016/j.rse.2019.03.010
(2019).
Google Scholar
Q. Vanhellemont and K. Ruddick,
“Atmospheric correction of metre-scale optical satellite data for inland and coastal water applications,”
Remote Sens. Environ., 216 586
–597 https://doi.org/10.1016/j.rse.2018.07.015
(2018).
Google Scholar
J. Gower et al.,
“Detection of intense plankton blooms using the 709 nm band of the MERIS imaging spectrometer,”
Int. J. Remote Sens., 26
(9), 2005
–2012 https://doi.org/10.1080/01431160500075857 IJSEDK 0143-1161
(2005).
Google Scholar
D. Zhao et al.,
“The relation of chlorophyll-a concentration with the reflectance peak near 700 nm in algae-dominated waters and sensitivity of fluorescence algorithms for detecting algal bloom,”
Int. J. Remote Sens., 31
(1), 39
–48 https://doi.org/10.1080/01431160902882512 IJSEDK 0143-1161
(2010).
Google Scholar
European Space Agency (ESA), Sentinel-2 User Handbook, European Space Agency (ESA)(
(2015). Google Scholar
R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria
(2019). Google Scholar
RStudio Team, RStudio: Integrated Development Environment for R, RStudio, PBC., Boston, Massachusetts
(2020). Google Scholar
N. Hamzehpour, H. Shafizadeh-Moghadam and R. Valavi,
“Exploring the driving forces and digital mapping of soil organic carbon using remote sensing and soil texture,”
CATENA, 182 104141 https://doi.org/10.1016/j.catena.2019.104141 CIJPD3 0341-8162
(2019).
Google Scholar
M.-C. Tu, P. Smith and A. M. Filippi,
“Hybrid forward-selection method-based water-quality estimation via combining Landsat TM, ETM+, and OLI/TIRS images and ancillary environmental data,”
PLoS One, 13
(7), e0201255 https://doi.org/10.1371/journal.pone.0201255 POLNCL 1932-6203
(2018).
Google Scholar
H. Akaike,
“Information theory and an extension of the maximum likelihood principle,”
Selected Papers of Hirotugu Akaike, 199
–213 Springer, New York
(1998). Google Scholar
C. J. V. Rijsbergen, Information Retrieval, 2nd ed.Butterworth-Heinemann(
(1979). Google Scholar
N. Kerle, L. L. Janssen and G. C. Huurneman,
“Principles of Remote Sensing,”
250 The Netherlands(
(2004). Google Scholar
J. Bramich, C. J. S. Bolch and A. Fischer,
“Improved red-edge chlorophyll-a detection for Sentinel 2,”
Ecol. Indic., 120 106876 https://doi.org/10.1016/j.ecolind.2020.106876
(2021).
Google Scholar
K. T. Peterson et al.,
“Machine learning-based ensemble prediction of water-quality variables using feature-level and decision-level fusion with proximal remote sensing,”
Photogramm. Eng. Remote Sens., 85
(4), 269
–280 https://doi.org/10.14358/PERS.85.4.269
(2019).
Google Scholar
M. Xu et al.,
“Implementation strategy and spatiotemporal extensibility of multipredictor ensemble model for water quality parameter retrieval with multispectral remote sensing data,”
IEEE Trans. Geosci. Remote Sens., 60 1
–16 https://doi.org/10.1109/TGRS.2020.3045921 IGRSD2 0196-2892
(2022).
Google Scholar
|