Proceedings Article | 21 September 2023
KEYWORDS: Data modeling, Meteorology, Education and training, Synthetic aperture radar, Forest fires, Machine learning, Atmospheric modeling, Satellites, Visual process modeling, Vegetation
This paper presents a benchmark dataset called EO4WildFires; a multi-sensor (multi spectral; Sentinel-2, Synthetic-Aperture Radar - SAR; Sentinel-1, meteorological parameters; NASA Power) time-series dataset that spans 45 countries, which can be used for developing machine learning and deep learning methods targeted for the estimation of the area that a forest wildfire might cover. This novel EO4WildFires dataset is annotated using EFFIS (European Forest Fire Information System) as forest fire detection and size estimation data source. A total of 31,730 wildfire events are gathered from 2018 to 2022. For each event, Sentinel-2 (multispectral), Sentinel-1 (SAR) and meteorological data are assembled into a single data cube. The meteorological parameters that are included in the data cube are: ratio of actual partial pressure of water vapor to the partial pressure at saturation, average temperature, bias corrected average total precipitation, average wind speed, fraction of land covered by snowfall, percent of root zone soil wetness, snow depth, snow precipitation, as well as percent of soil moisture. The main problem that this dataset is designed to address, is the severity forecasting before wildfires occur. The dataset is not used to predict wildfire events, but rather to predict the severity (size of area damaged by fire) of a wildfire event, if that happens in a specific place under the current and historical forest status, as recorded from multispectral and SAR images, and meteorological data. Using the data cube for the collected wildfire events, the EO4WildFires dataset is used to realize three different preliminary experiments, to evaluate the contributing factors for wildfire severity prediction. The first experiment evaluates wildfire size using only the meteorological parameters, the second one utilizes both the multispectral and SAR parts of the dataset, while the third exploits all dataset parts. In each experiment, machine learning models are developed, and their accuracy is evaluated. The results show that the size of wildfire events can be estimated better using Sentinel-2 data. Second in terms of accuracy is Sentinel-1, while the usage of only meteorological data presented the lowest accuracy among the three.