ExploratoryAnalysis_committee.Rmd

---
title: "ExploratoryAnalysis_Committee"
author: "Zach"
date: "5/27/2021"
output: html_document
editor_options: 
  chunk_output_type: console
---
This file will explore ideas proposed by the committee during the prospectus defense on 5/25/2021


Import Libraries to be used
```{r}
library(lubridate)
library(tidyverse)
library(ggplot2)
library(patchwork)
library(tidycensus)
library(dplyr)
library(readr)
library(sf)
library(tmap)
library(rgeos)
library(RColorBrewer)

library(rgdal)

library(XML)
library(units)
```


### Checking Value of Q

It is expected that with time KBDI plots will become equal regardless of the initial value of Q


Copy and paste KBDI code

## Import the summary of the day data and add columns to the data frame

Assign no rainfall on days with missing values.

For Tallahassee Station TLH
```{r}
TLH.df <- read_csv(file = 'Data/TLH_Daily1940.csv') %>%
  rename(Date = DATE) %>%
  mutate(Year = year(Date), 
         month = month(Date, label = TRUE, abbr = TRUE),
         doy = yday(Date),
         MaxTemp = TMAX,
         MinTemp = TMIN,
         Rainfall24 = PRCP,
         Rainfall24 = replace_na(Rainfall24, 0),
         Rainfall24mm = Rainfall24 * 25.4)

#Fill in the missing temperature data
TLH.df$MaxTemp[TLH.df$Date == "2005-07-08"] <- 96
```

Calculating qlm using inital Q-value of 269. Add these columns to the TLH.df data frame. This data frame will be referenced and filtered as needed.
```{r}
Rainfall24 <- TLH.df$Rainfall24
PR <- dplyr::lag(Rainfall24)
PR[1] <- 0

CumR <- 0
NetR <- numeric()

for(i in 1:length(Rainfall24)) {
  R24 <- Rainfall24[i]
  if (R24 == 0) {
    NetR[i] <- 0
    CumR <- 0
  } 
  else if(R24 > 0 & R24 <= .2) {
      CumR <- CumR + R24
      if (PR[i] > .2 | CumR > .2) NetR[i] <- R24
      else if (CumR > .2) NetR[i] <- CumR - .2
      else NetR[i] <- 0
    }
  else if (R24 > .2) {
      if (CumR <= .2) {
      NetR[i] <- CumR + R24 - .2
      CumR <- CumR + R24
      }
      else {
      NetR[i] <- R24
      CumR <- CumR + R24
      }
  }
}

TLH.df$NetR <- NetR

Q <- 269  
R <- 59.23 # average annual rainfall for TLH in inches

MaxTemp <- TLH.df$MaxTemp

Ql <- numeric()
DeltaQl <- numeric()
for(i in 1:length(Rainfall24)){
  DeltaQ <- (800 - Q) * (.968 * exp(.0486 * MaxTemp[i]) - 8.3) /(1 + 10.88 * exp(-.0441 * R)) * .001 
  Q <- ifelse(NetR[i] == 0,  Q + DeltaQ,  (Q + DeltaQ) - NetR[i] * 100)
  Q <- ifelse(Q < 0, 0, Q) 
  Ql <- c(Ql, Q)
  DeltaQl <- c(DeltaQl, DeltaQ)
}

TLH.df$Ql <- Ql
TLH.df$Qlm <- Ql * .254  # tenth of an inch to mm
TLH.df$DeltaQl <- DeltaQl
TLH.df$DroughtIndex <- floor(Ql/100)
```


A new data frame is created to compare initial Q values. Q is representative of the starting value of KBDI. Here a column of several Q values is created to be used and compared among multiple plots. Daily KBDI is calculated for each initial Q value. This is testing the theory that with enough time elapsed, the initial value of Q does not matter.

rangeQ.df data frame is initialized at start of TLH.df data frame. (March 1940)
```{r}
Rainfall24 <- TLH.df$Rainfall24
PR <- dplyr::lag(Rainfall24)
PR[1] <- 0
CumR <- 0
NetR <- numeric()
for(i in 1:length(Rainfall24)) {
  R24 <- Rainfall24[i]
  if (R24 == 0) {
    NetR[i] <- 0
    CumR <- 0
  } 
  else if(R24 > 0 & R24 <= .2) {
      CumR <- CumR + R24
      if (PR[i] > .2 | CumR > .2) NetR[i] <- R24
      else if (CumR > .2) NetR[i] <- CumR - .2
      else NetR[i] <- 0
    }
  else if (R24 > .2) {
      if (CumR <= .2) {
      NetR[i] <- CumR + R24 - .2
      CumR <- CumR + R24
      }
      else {
      NetR[i] <- R24
      CumR <- CumR + R24
      }
  }
}

TLH.df$NetR <- NetR
R <- 59.23

rangeQ.df <- TLH.df %>%
  select(Date, Rainfall24, MaxTemp, NetR)

for(i in 1:9){
  Q2 <- i-1
  Q <- Q2*100
  MaxTemp <- TLH.df$MaxTemp
  Ql <- numeric()
  for(i in 1:length(Rainfall24)){
    DeltaQ <- (800 - Q) * (.968 * exp(.0486 * MaxTemp[i]) - 8.3) /(1 + 10.88 * exp(-.0441 * R)) * .001 
    Q <- ifelse(NetR[i] == 0,  Q + DeltaQ,  (Q + DeltaQ) - NetR[i] * 100)
    Q <- ifelse(Q < 0, 0, Q) 
    Ql <- c(Ql, Q)
  }
  rangeQ.df[, ncol(rangeQ.df)+1] <- Ql
  colnames(rangeQ.df) <- c("Date", "Rainfall24", "MaxTemp", "NetR", "Q0", "Q100", "Q200", "Q300", "Q400", "Q500", "Q600", "Q700", "Q800")
}
```

```{r}
head(rangeQ.df)
```


Plot Q values for first year of data period (1940)
```{r}
p1_1940 <- rangeQ.df %>% 
  filter(year(Date) == 1940) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(aes(y = Q0, color = "0")) +
    geom_line(aes(y = Q400, color = "400")) + 
    geom_line(aes(y = Q800, color = "800")) +
    scale_color_manual("Initial Q-values",
                       breaks = c("0", "400", "800"),
                       values = c("red", "blue", "black")) +
  ylab("KBDI") +
  scale_x_date(date_labels = "%b %Y")
  #theme(legend.position = "bottom")
  #theme(legend.position = c(0.8,0.3)

p2_1940 <- rangeQ.df %>% 
  filter(year(Date) == 1940) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(mapping = aes(y = Rainfall24)) +
  ylab("24Hr Net Rainfall") +
  scale_x_date(date_labels = "%b %Y")

p2_1940 / p1_1940 +
  plot_annotation(
    title = "KBDI values merge with time regardless of initial Q-values",
    subtitle = "Q-values intialized at 0, 400, and 800",
    caption = "Period of Record: March 1940 - December 1940") +
  theme(legend.position = "bottom")
```


Filtering TLH data frame to start in 1991. This is one year prior to the fire data obtained from the Forest Service Research Data Archive
```{r}
TLH1991.df <- TLH.df %>%
  filter(year(Date) >= 1991)
```

Initialize different Q values starting in 1991 and create rangeQ1991.df
```{r}
Rainfall24 <- TLH1991.df$Rainfall24
PR <- dplyr::lag(Rainfall24)
PR[1] <- 0
CumR <- 0
NetR <- numeric()
for(i in 1:length(Rainfall24)) {
  R24 <- Rainfall24[i]
  if (R24 == 0) {
    NetR[i] <- 0
    CumR <- 0
  } 
  else if(R24 > 0 & R24 <= .2) {
      CumR <- CumR + R24
      if (PR[i] > .2 | CumR > .2) NetR[i] <- R24
      else if (CumR > .2) NetR[i] <- CumR - .2
      else NetR[i] <- 0
    }
  else if (R24 > .2) {
      if (CumR <= .2) {
      NetR[i] <- CumR + R24 - .2
      CumR <- CumR + R24
      }
      else {
      NetR[i] <- R24
      CumR <- CumR + R24
      }
  }
}

TLH1991.df$NetR <- NetR
R <- 59.23

rangeQ1991.df <- TLH1991.df %>%
  select(Date, Rainfall24, MaxTemp, NetR)

for(i in 1:9){
  Q2 <- i-1
  Q <- Q2*100
  MaxTemp <- TLH1991.df$MaxTemp
  Ql <- numeric()
  for(i in 1:length(Rainfall24)){
    DeltaQ <- (800 - Q) * (.968 * exp(.0486 * MaxTemp[i]) - 8.3) /(1 + 10.88 * exp(-.0441 * R)) * .001 
    Q <- ifelse(NetR[i] == 0,  Q + DeltaQ,  (Q + DeltaQ) - NetR[i] * 100)
    Q <- ifelse(Q < 0, 0, Q) 
    Ql <- c(Ql, Q)
  }
  rangeQ1991.df[, ncol(rangeQ1991.df)+1] <- Ql
  colnames(rangeQ1991.df) <- c("Date", "Rainfall24", "MaxTemp", "NetR", "Q0", "Q100", "Q200", "Q300", "Q400", "Q500", "Q600", "Q700", "Q800")
}
```

Plot 1991, year before fire data starts
```{r}
p1_1991 <- rangeQ1991.df %>% 
  filter(year(Date) == 1991) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(aes(y = Q0, color = "0")) +
    geom_line(aes(y = Q400, color = "400")) + 
    geom_line(aes(y = Q800, color = "800")) +
    scale_color_manual("Initial Q-values",
                       breaks = c("0", "400", "800"),
                       values = c("red", "blue", "black")) +
  ylab("KBDI") +
  scale_x_date(date_labels = "%b %Y")
  #theme(legend.position = "bottom")
  #theme(legend.position = c(0.8,0.3)

p2_1991 <- rangeQ1991.df %>% 
  filter(year(Date) == 1991) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(mapping = aes(y = Rainfall24)) +
  ylab("24Hr Net Rainfall") +
  scale_x_date(date_labels = "%b %Y")

p2_1991 / p1_1991 +
  plot_annotation(
    title = "KBDI values merge with time regardless of initial Q-values",
    subtitle = "Q-values intialized at 0, 400, and 800",
    caption = "Period of Record: 1991") +
  theme(legend.position = "bottom")
```

focus plot on first month
```{r}
p1_1991 <- rangeQ1991.df %>% 
  filter(year(Date) == 1991) %>%
  filter(month(Date) == 1) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(aes(y = Q0, color = "0")) +
    geom_line(aes(y = Q400, color = "400")) + 
    geom_line(aes(y = Q800, color = "800")) +
    scale_color_manual("Initial Q-values",
                       breaks = c("0", "400", "800"),
                       values = c("red", "blue", "black")) +
  ylab("KBDI")
  #theme(legend.position = "bottom")
  #theme(legend.position = c(0.8,0.3)

p2_1991 <- rangeQ1991.df %>% 
  filter(year(Date) == 1991) %>%
  filter(month(Date) == 1) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(mapping = aes(y = Rainfall24)) +
  ylab("24Hr Net Rainfall")

p2_1991 / p1_1991 +
  plot_annotation(
    title = "KBDI values merge with time regardless of initial Q-values",
    subtitle = "Q-values intialized at 0, 400, and 800",
    caption = "Period of Record: January 1991") +
  theme(legend.position = "bottom")
```

Display Q-values for 1941, the first full year of data. Compare this to 1991, one year before the period of fire data.
Filtering TLH data frame to start in 1991. This is one year prior to the fire data obtained from the Forest Service Research Data Archive
```{r}
TLH1941.df <- TLH.df %>%
  filter(year(Date) >= 1941)
```

Initialize different Q values starting in 1941 and create rangeQ1941.df
```{r}
Rainfall24 <- TLH1941.df$Rainfall24
PR <- dplyr::lag(Rainfall24)
PR[1] <- 0
CumR <- 0
NetR <- numeric()
for(i in 1:length(Rainfall24)) {
  R24 <- Rainfall24[i]
  if (R24 == 0) {
    NetR[i] <- 0
    CumR <- 0
  } 
  else if(R24 > 0 & R24 <= .2) {
      CumR <- CumR + R24
      if (PR[i] > .2 | CumR > .2) NetR[i] <- R24
      else if (CumR > .2) NetR[i] <- CumR - .2
      else NetR[i] <- 0
    }
  else if (R24 > .2) {
      if (CumR <= .2) {
      NetR[i] <- CumR + R24 - .2
      CumR <- CumR + R24
      }
      else {
      NetR[i] <- R24
      CumR <- CumR + R24
      }
  }
}

TLH1941.df$NetR <- NetR
R <- 59.23

rangeQ1941.df <- TLH1941.df %>%
  select(Date, Rainfall24, MaxTemp, NetR)

for(i in 1:9){
  Q2 <- i-1
  Q <- Q2*100
  MaxTemp <- TLH1941.df$MaxTemp
  Ql <- numeric()
  for(i in 1:length(Rainfall24)){
    DeltaQ <- (800 - Q) * (.968 * exp(.0486 * MaxTemp[i]) - 8.3) /(1 + 10.88 * exp(-.0441 * R)) * .001 
    Q <- ifelse(NetR[i] == 0,  Q + DeltaQ,  (Q + DeltaQ) - NetR[i] * 100)
    Q <- ifelse(Q < 0, 0, Q) 
    Ql <- c(Ql, Q)
  }
  rangeQ1941.df[, ncol(rangeQ1941.df)+1] <- Ql
  colnames(rangeQ1941.df) <- c("Date", "Rainfall24", "MaxTemp", "NetR", "Q0", "Q100", "Q200", "Q300", "Q400", "Q500", "Q600", "Q700", "Q800")
}
```

Create plots of 1941 to be compared with 1991
```{r}
p1_1941 <- rangeQ1941.df %>% 
  filter(year(Date) == 1941) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(aes(y = Q0, color = "0")) +
    geom_line(aes(y = Q400, color = "400")) + 
    geom_line(aes(y = Q800, color = "800")) +
    scale_color_manual("Initial Q-values",
                       breaks = c("0", "400", "800"),
                       values = c("red", "blue", "black")) +
  ylab("KBDI") +
  scale_x_date(date_labels = "%b %Y")
  #theme(legend.position = "bottom")
  #theme(legend.position = c(0.8,0.3)

p2_1941 <- rangeQ1941.df %>% 
  filter(year(Date) == 1941) %>%
  ggplot(mapping = aes(x = Date)) +
    geom_line(mapping = aes(y = Rainfall24)) +
  ylab("24Hr Net Rainfall") +
  scale_x_date(date_labels = "%b %Y")
```

Create 4 panel plot showing showing 1941 (first full year of KBDI data) and 1991(one year prior to the start of fire data) side by side.
```{r}
((p2_1941 + ylim(0,5)) + (p2_1991 + ylim(0, 5))) / 
  ((p1_1941 + theme(legend.position = "none")) + p1_1991) +
  plot_annotation(
    title = "KBDI values merge with time regardless of initial Q-values",
    subtitle = "Q-values intialized at 0, 400, and 800",
    caption = "Period of Record: 1940 (Left) & 1991 (Right)")
```


##Explore lightning data and concerns mentioned by committee.

Get lightning data

Daily county-level counts 1986-2013. Data location: https://www1.ncdc.noaa.gov/pub/data/swdi/reports/county/byFips/
Note: this data is not spatial and cannot be bounded by the ANF.For exploratory analysis purposes, this data will be explored across counties that the ANF is within. These counties are Liberty, Wakulla, Franklin, and Leon.

First get the Florida fips codes for Liberty, Wakulla, Franklin and Leon counties, then get the data.
```{r}
FLfips <- fips_codes %>%
  filter(state == "FL") %>%
  filter(county %in% c("Liberty County", "Wakulla County", "Franklin County", "Leon County")) %>%
  pull(county_code)

fn <- paste0("https://www1.ncdc.noaa.gov/pub/data/swdi/reports/county/byFips/swdireport-12", FLfips, "-BETA.csv")

lightningdata.df <- data.frame()
for(i in 1:length(fn)){
  X <- read.csv(fn[i], na.strings = "NULL", header = TRUE, stringsAsFactors = FALSE)
  lightningdata.df <- rbind(lightningdata.df, X)
}

lightningdata.df <- lightningdata.df %>%
  mutate(SEQDAY = as.Date(SEQDAY),
         DAY = day(SEQDAY),
         MONTH = month(SEQDAY),
         YEAR = year(SEQDAY))
```

Attempt to read files from each downloaded csv because website import is not reliable and does not always open
```{r}
lightningdata1.df <- list.files(path = "C:/Users/zlaw9/OneDrive/GITHUB/KDBI code/FIPS_LightningData",
                  pattern = "*.csv", full.names = TRUE) %>%
  lapply(read_csv) %>%
  bind_rows

lightningdata1.df <- lightningdata1.df %>%
  mutate(SEQDAY = as.Date(SEQDAY),
         DAY = day(SEQDAY),
         MONTH = month(SEQDAY),
         YEAR = year(SEQDAY)) %>%
#missing data beyond 5/20/2013 for FIPS #12037. Remove last 5 rows from data set. This addresses the parsing failure and removes data that is NA.
  filter(SEQDAY <=  "2013-05-20")
```


lightningdata1.df gives lighting data for each individual county based on fips code. Combine total lightning strikes in ANF counties by date.

```{r}
countlightning.df <- lightningdata1.df %>%
  group_by(SEQDAY) %>%
  summarise(LightningCount = sum(FCOUNT_NLDN))
```

Explore the relationship between high lightning count days and number of lightning sparked wildfires in the ANF

County boundaries will be used instead of forest boundaries to keep bounds of fires and lightning consistent. Run county bounds code chunk.
Import Forest Boundaries to explore lightning fire data bounded by the ANF
```{r}
if(!"S_USA.NFSLandUnit" %in% list.files()){
  download.file("https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.NFSLandUnit.zip",
                "S_USA.NFSLandUnit.zip")
unzip("S_USA.NFSLandUnit.zip")
}

NF_Bounds.sf <- st_read(dsn = "s_USA.NFSLandUnit.shp") %>%
  st_transform(crs = 3086)

anfbounds.sf <- NF_Bounds.sf %>%
  filter(NFSLANDU_2 == "Apalachicola National Forest")

#when running code from ANF_Fires Run this line
#ANF_Boundary.sf <- NF_Bounds.sf %>%
  #filter(NFSLANDU_2 == "Apalachicola National Forest")
```

Import County Boundaries of the forest. This will make the bounds consistent across both the fire data set and the lightning data set.
```{r}
ANFcountyBounds.sf <- st_read(dsn = "ANFCounties.shp") %>%
  st_transform(crs = 3086)
```


Import fire data

This is the old data set containing fires only until 2015
```{r}
if(!"FL_Fires" %in% list.files()){
download.file("http://myweb.fsu.edu/jelsner/temp/data/FL_Fires.zip",
"FL_Fires.zip")
unzip("FL_Fires.zip")
}
FL_Fires.sf <- st_read(dsn = "FL_Fires") %>%
st_transform(crs = 3086)

#filtered fires within ANF bounds
#anf_fires.sf <- st_join(FL_Fires.sf, anfbounds.sf, join = st_within) %>%
  #filter(NFSLANDU_2 == 'Apalachicola National Forest')

#filtered for county bounds of ANF. This is to match the bounds of the lightning dataset
anf_fires.sf <- st_join(FL_Fires.sf, ANFcountyBounds.sf, join = st_within) %>%
  filter(CNTY_FIPS != "NA") %>%
  select(FOD_I, FIRE_N, FIRE_Y, DISCOVERY_, STAT_CAU_1, FIRE_SIZE, FIRE_SIZE_, LATIT, LONGI, NAME, FIPS, geometry)

anf_LF.sf <- anf_fires.sf %>%
  filter(STAT_CAU_1 == "Lightning")
```

Archived fire data has been updated through 2018.
Website link with data and metadata
https://www.fs.usda.gov/rds/archive/Catalog/RDS-2013-0009.5 

Attempt to read file in directly from website. Receiving warning messages:

1: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 1: This version of GeoPackage user_version=0x0000283C (10300, v1.3.0) on 'C:\Users\zlaw9\OneDrive\GITHUB\KDBI code\Data\updatedFireData\Data\FPA_FOD_20210617.gpkg' may only be partially supported
2: attribute variables are assumed to be spatially constant throughout all geometries 

Taking a long time to load.
```{r}
#if(!"Fires2018" %in% list.files()){
  #download.file("https://www.fs.usda.gov/rds/archive/products/RDS-2013-0009.5/RDS-2013-0009.5_GPKG.zip",
                #"Data/updatedFireData/Fires2018.zip")
  #unzip("Data/updatedFireData/Fires2018.zip",
        #exdir = "Data/updatedFireData")
#}

updatedFires.sf <- st_read(dsn = "Data/updatedFireData/Data/FPA_FOD_20210617.gpkg", layer = "Fires") %>%
  filter(STATE == "FL") %>%
  st_transform(crs = st_crs(ANFcountyBounds.sf)) %>%
  st_intersection(ANFcountyBounds.sf) %>%
  select(FOD_ID, FIRE_NAME, FIRE_YEAR, DISCOVERY_DATE, NWCG_CAUSE_CLASSIFICATION, NWCG_GENERAL_CAUSE, FIRE_SIZE, FIRE_SIZE_CLASS, LATITUDE, LONGITUDE, NAME, FIPS_CODE, Shape)

anf_LF.sf <- updatedFires.sf %>%
  filter(NWCG_GENERAL_CAUSE == "Natural")
```

Reformat date column to not include time. Rename to match other merged columns
```{r}
anf_LF.sf$DISCOVERY_DATE <- as.Date(anf_LF.sf$DISCOVERY_DATE)

anf_LF.sf <- anf_LF.sf %>%
  rename(DISCOVERY_ = DISCOVERY_DATE)
```

Fire set downloaded and pulled from file rather than directly from the website. Importing from GPKG runs slow. Downloaded and converted to more user friendly shapefiles in ESRI.

New Data Set has change categories for cause. Lightning is not listed. Is lightning the equivalent of fires filtered as natural? Explore this at the bottom of the rmd.

```{r}
updatedFires.sf <- st_read(dsn = "C:/Users/zlaw9/OneDrive/GITHUB/KDBI code/updatedFires",
                           layer = "FLFiresUpdated") %>%
  st_transform(crs = st_crs(ANFcountyBounds.sf)) %>%
  st_intersection(ANFcountyBounds.sf) %>%
  select(FOD_ID, FIRE_NAME, FIRE_YEAR, DISCOVERY_, NWCG_CAUSE, NWCG_GENER, FIRE_SIZE, FIRE_SIZE_, LATITUDE, LONGITUDE, NAME, FIPS, geometry)
  
anf_LF.sf <- updatedFires.sf %>%
  filter(NWCG_CAUSE == "Natural")
```


View fire points that have been bounded by four counties (Liberty, Leon, Wakulla, and Franklin)
```{r}
tmap_mode("view")

tm_shape(ANFcountyBounds.sf) +
  tm_borders()

tm_shape(anf_LF.sf) +
  tm_dots(col = "orange")
```

create data set containing fires and lightning counts. Match the years across the data sets (1992 - 2013)
Merge1.df is a new data frame that contains the number of lightning strikes and fires occurring in the Apalachicola National Forest broken down by day from 1992 - 2013.
```{r}
#exploreLF <- anf_LF.sf %>%
  #filter(FIRE_Y <= 2013)

countfires.df <- anf_LF.sf %>%
#countfires.df <- exploreLF %>%
  count(DISCOVERY_) %>%
  rename(FireCount = n, Date = DISCOVERY_)

countlightning.df <- countlightning.df %>%
  filter(year(SEQDAY) >= 1992)
  #rename(LightningCount = FCOUNT_NLDN)

merge1.df <- left_join(x = countlightning.df, y = countfires.df, by = c("SEQDAY" = "Date")) %>%
  rename(Date = SEQDAY) %>%
  select(Date, LightningCount, FireCount)
```

Create new data frame (TLH1.df) to match the dates with merge1.df. These data frames will be combined.
```{r}
TLH1.df <- TLH.df %>%
  filter(year(Date) >= 1992) %>%
  filter(Date <= "2013-05-25")
```


Merge data frames to have lightning strike count, fire count and Qlm all in one dataframe
```{r}
merge2.df <- left_join(x = merge1.df, y = TLH1.df, by = c("Date")) %>%
  select(Date, LightningCount, FireCount, Ql)

merge2.df$FireBool <- !is.na(merge2.df$FireCount)
```

General linear regression - Study how lightning influences the probability of fire occurrence 
Predict the occurrence of a fire for each day in the data set based on lightning count.
Note: This plot includes days with zero lightning strikes and concludes that on days with zero lightning strikes there is a 3% chance of a lightning ignited wildfire occurring. A lightning fire cannot occur without lighting. Should be zero percent.
```{r}
#general linear regression
glmLightning <- glm(formula = FireBool ~ LightningCount, family = binomial, data = merge2.df)

#probabilities based on glm
pred_glmLightning <- predict(object = glmLightning,
                             type = "response",
                             se.fit = TRUE)
#creating confidence interval
lowerLightning <- pred_glmLightning$fit - (1.96*pred_glmLightning$se.fit)
upperLightning <- pred_glmLightning$fit + (1.96*pred_glmLightning$se.fit)

ggplot(mapping = aes(x = merge2.df$LightningCount, y = pred_glmLightning$fit)) +
  geom_ribbon(aes(ymin = lowerLightning, ymax = upperLightning), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Probablility") +
  xlab("Number of Lightning Strikes") +
  labs(
    title = "Predicted Probability of a Fire Occurrence Based on the Number of Lightning Strikes in a Day",
    subtitle = "With a 95% Confidence Interval",
    caption = "Period of Record: 1-1-1992 to 5-20-2013")
```


Filter data for only dates in the fire season before running glm.

When filtering to only the fire season, there is more uncertainty, and the model is closer to linear.
```{r}
fireSeasondates2013.df <- merge2.df %>%
  filter(month(Date) == 05 | month(Date) == 06 | month(Date) == 07)

#general linear regression
glmLightningFS <- glm(formula = FireBool ~ LightningCount, family = binomial, data = fireSeasondates2013.df)

#probabilities based on glm
pred_glmLightningFS <- predict(object = glmLightningFS,
                             type = "response",
                             se.fit = TRUE)
#creating confidence interval
lowerLightningFS <- pred_glmLightningFS$fit - (1.96*pred_glmLightningFS$se.fit)
upperLightningFS <- pred_glmLightningFS$fit + (1.96*pred_glmLightningFS$se.fit)

ggplot(mapping = aes(x = fireSeasondates2013.df$LightningCount, y = pred_glmLightningFS$fit)) +
  geom_ribbon(aes(ymin = lowerLightningFS, ymax = upperLightningFS), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Probablility") +
  xlab("Number of Lightning Strikes") +
  labs(
    title = "Predicted Probability of Fire Occurrence Based on the Number of Lightning Strikes in a Day\nFiltered By Fire Season Months",
    subtitle = "With 95% Confidence Interval",
    caption = "Fire Season Months (May - July) 1992 - 2013")
```


General linear regression - Study how KBDI impacts lightning wildfire occurrence.

Update to include fires beyond 2012. No longer bounded by lightning data dates and can use full fire data set.
qlm file data frame needs to match new fire data frame dates. Filter TLH.df which was initialized in 1940. This data frame can be referenced cause Q-values merge and values will be the same as initializing one year prior to the start. This limits the amount of code and time to run the entire qlm code chunk again

```{r}
qlm2018fires <- TLH.df %>% 
  filter(year(Date) >= 1992) %>%
  filter(year(Date) <= 2018)
```

```{r}
countfiresmerge2018.df <- left_join(x = qlm2018fires, y = countfires.df, by = c("Date" = "Date")) %>%
  select(Date, FireCount, Ql)

countfiresmerge2018.df$FireBool <- !is.na(countfiresmerge2018.df$FireCount)
```


```{r}
glmQl <- glm(formula = FireBool ~ Ql, family = binomial, data = countfiresmerge2018.df)

pred_glmQl <- predict(object = glmQl,
                      type = "response",
                      se.fit = TRUE)

lowerQl <- pred_glmQl$fit - (1.96*pred_glmQl$se.fit)
upperQl <- pred_glmQl$fit + (1.96*pred_glmQl$se.fit)

ggplot(mapping = aes(x = countfiresmerge2018.df$Ql, y = pred_glmQl$fit)) +
  geom_ribbon(aes(ymin = lowerQl, ymax = upperQl), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Probablility") +
  xlab("Daily Ql") +
  labs(
    title = "Predicted Probability of Fire Occurrence Based on Daily KBDI Values",
    subtitle = "With 95% Confidence Interval",
    caption = "Period of Record: 1992 to 2018")
```

KBDI impacts on lightning wildfire occurrence during the fire season. KBDI has a much stronger impact during the fire season months.
```{r}
fireSeasondates2018.df <- countfiresmerge2018.df %>%
  filter(month(Date) == 05 | month(Date) == 06 | month(Date) == 07)

glmQlFS <- glm(formula = FireBool ~ Ql, family = binomial, data = fireSeasondates2018.df)

pred_glmQlFS <- predict(object = glmQlFS,
                      type = "response",
                      se.fit = TRUE)

lowerQlFS <- pred_glmQlFS$fit - (1.96*pred_glmQlFS$se.fit)
upperQlFS <- pred_glmQlFS$fit + (1.96*pred_glmQlFS$se.fit)

ggplot(mapping = aes(x = fireSeasondates2018.df$Ql, y = pred_glmQlFS$fit)) +
  geom_ribbon(aes(ymin = lowerQlFS, ymax = upperQlFS), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Probablility") +
  xlab("Daily Ql") +
  labs(
    title = "Predicted Probability of Fire Occurrence Based on Daily KBDI Values\nFiltered By Fire Season Months",
    subtitle = "With 95% Confidence Interval",
    caption = "Fire Season Months (May - July) 1992 - 2018")
```


Run general linear regression. The number of lightning strikes and the saturation of the soil are both statistically significant.
```{r}
GLM <- glm(formula = FireBool ~ LightningCount + Ql, family = binomial, data = merge2.df)

pred_GLM <- predict(object = GLM,
                    type = "response",
                    se.fit = TRUE)

ggplot(mapping = aes(x = merge2.df$Ql, y = merge2.df$LightningCount, color = pred_GLM$fit)) +
  geom_point() +
  scale_color_gradient(low = "orange", high = "red4") +
  xlab("KBDI") +
  ylab("Number of Lightning Strikes") +
  labs(color = "Probability of Fire") +
  labs(title = "Predicted Probability of Fire Occurrence\nBased on Daily KBDI Values and Daily Lightning Strikes",
       caption = "Period of Record: 1-1-1992 to 5-20-2012")
  

#GLM
#summary(GLM)
#exp(GLM$coefficients)
```

Filter for fire season
```{r}
GLMFS <- glm(formula = FireBool ~ LightningCount + Ql, family = binomial, data = fireSeasondates2013.df)

pred_GLMFS <- predict(object = GLMFS,
                    type = "response",
                    se.fit = TRUE)

ggplot(mapping = aes(x = fireSeasondates2013.df$Ql, y = fireSeasondates2013.df$LightningCount, color = pred_GLMFS$fit)) +
  geom_point() +
  scale_color_gradient(low = "orange", high = "red4") +
  xlab("KBDI") +
  ylab("Number of Lightning Strikes") +
  labs(color = "Probability of Fire") +
  labs(title = "Predicted Probability of Fire Occurrence\nBased on Daily KBDI Values and Daily Lightning Strikes",
       caption = "Fire Season Months (May - July) 1992 - 2013")
```

### Does the occurance of a large fire create a time lag in the occurrence of another fire?

make a correction to the anf_LF.sf data frame. This data frame is bounded by counties because counties were used to bound lightning data. Correct this to account for the anf boundaries rather than county boundaries
```{r}
anf_LF_corrected.sf <- anf_LF.sf %>%
  st_transform(crs = st_crs(anfbounds.sf)) %>%
  st_intersection(anfbounds.sf)
```

```{r}
tmap_mode("view")

tm_shape(anfbounds.sf) +
  tm_borders()

tm_shape(anf_LF_corrected.sf) +
  tm_dots(col = "orange")
```

Explore relationship between the number of fires and fire size.
```{r}
seasonFires <- anf_LF_corrected.sf %>%
  filter(month(DISCOVERY_) == 05 | month(DISCOVERY_) == 06 | month(DISCOVERY_) == 07) %>%
  count(FIRE_YEAR) %>%
  rename(Year = FIRE_YEAR, nFIRES = n) %>%
  st_set_geometry(NULL)

LargestFires <- anf_LF.sf %>%
  filter(month(DISCOVERY_) == 05 | month(DISCOVERY_) == 06 | month(DISCOVERY_) == 07) %>%
  group_by(FIRE_YEAR) %>%
  summarise(LargestFire = max(FIRE_SIZE)) %>%
  rename(Year = FIRE_YEAR) %>%
  st_set_geometry(NULL)

seasonSummary <- merge(x = seasonFires, y = LargestFires, by = c("Year"))

head(seasonSummary)

#cor(seasonSummary$nFIRES, seasonSummary$LargestFire)
```

###Spatial Distribution of Natural Wildfires
suggested from `suggest_crs()`. Here projected 2779 NAD83(HARN) / Florida North
```{r}
library(spatstat)
library(maptools)
#remotes::install_github("walkerke/crsuggest")
library(crsuggest)

suggest_crs(anfbounds.sf)

W <- anfbounds.sf %>% 
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.owin()

#FireSeason Object
SeasonFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(5, 6, 7)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

SeasonFires.ppp <- SeasonFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(SeasonFires.ppp)

#April Fire Object
AprilFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(4)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

AprilFires.ppp <- AprilFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(AprilFires.ppp)

#May Fires Object
MayFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(5)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

MayFires.ppp <- MayFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(MayFires.ppp)

#June Fires Object
JuneFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(6)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

JuneFires.ppp <- JuneFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(JuneFires.ppp)

#July Fires object
JulyFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(7)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

JulyFires.ppp <- JulyFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(JulyFires.ppp)
```

Fire Season Distribution
```{r}
par(mfrow=c(1,2))

SeasonFires.ppp %>%
  plot(main = "Season Fires")
SeasonFires.ppp %>%
  density() %>%
  plot(main = "Spatial Distribution - Season Fires")
```

Distribution across April, May, June, July
```{r}
par(mfrow=c(2,2))
AprilFires.ppp %>%
  density() %>%
  plot(main = "April")
MayFires.ppp %>%
  density() %>%
  plot(main = "May")
JuneFires.ppp %>%
  density() %>%
  plot(main = "June")
JulyFires.ppp %>%
  density() %>%
  plot(main = "July")
```

April Fire Distribution - Only 4 fires across all April months in the data set?
```{r}
par(mfrow=c(2,2))
AprilFires.ppp %>%
  plot(main = "April")
MayFires.ppp %>%
  plot(main = "May")
JuneFires.ppp %>%
  plot(main = "June")
JulyFires.ppp %>%
  plot(main = "July")
```

Explore how the distribution of wildfires may vary based on season
```{r}
FallFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(8, 9, 10)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

FallFires.ppp <- FallFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(FallFires.ppp)

WinterFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(11, 12, 1)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

WinterFires.ppp <- WinterFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(WinterFires.ppp)

SpringFires.ppp <- anf_LF.sf %>% 
  filter(month(DISCOVERY_) %in% c(2, 3, 4)) %>%
  st_geometry() %>%
  st_transform(crs = 2779) %>%
  as_Spatial() %>% 
  as.ppp()

SpringFires.ppp <- SpringFires.ppp[W] %>%
  rescale(s = 1000, 
          unitname = "km")
summary(SpringFires.ppp)
```

```{r}
par(mfrow=c(2,2))
WinterFires.ppp %>%
  density() %>%
  plot(main = "November, December, January")
SpringFires.ppp %>%
  density() %>%
  plot(main = "February, March, April")
SeasonFires.ppp %>%
  density() %>%
  plot(main = "May, June, July - Fire Season")
FallFires.ppp %>%
  density() %>%
  plot(main = "August, September, October")
```

November, December, January - Only one lightning fire occurred during this group of months.
```{r}
par(mfrow=c(1,2))

WinterFires.ppp %>%
  plot(main = "November, December, January")
WinterFires.ppp %>%
  density() %>%
  plot(main = "Spatial Distribution - November, December, January")
```

Use this code to turn off combined plots
```{r}
dev.off() 
```

###Data Exploration: Precip, Temp & KBDI

create data frame with both fire and KBDI data
```{r}
KBDI_Fires.sf <- left_join(x = anf_LF.sf, y = TLH.df, by = c("DISCOVERY_" = "Date")) %>%
  dplyr::select(FOD_ID, FIRE_NAME, FIRE_YEAR, DISCOVERY_, FIRE_SIZE, FIRE_SIZE_CLASS, Ql, DroughtIndex)
```

Explore the percentage of wildfires that occur in each KBDI category 
```{r}
KBDI_per <- KBDI_Fires.sf %>%
  group_by(DroughtIndex) %>%
  summarise(count = n(),
            percent = count/nrow(KBDI_Fires.sf))

plot1 <- ggplot(data = KBDI_per,
       mapping = aes(y = percent, 
                     x = DroughtIndex,
                     fill = percent)) +
  geom_col() +
  coord_flip() +
  scale_fill_distiller(palette = "Oranges",
                       direction = 1,
                       guide = "none") +
  scale_y_continuous(labels = scales::percent) +
  xlab("") + ylab("") +
  labs(title = "Year Round") +
  #labs(title = "Over 70% of Wildfires Occur During Dry Conditions",
       #subtitle = "KDBI = 7-6: Drought, 4-5: Dry, 1-3: Moist, 0: Saturated",
       #caption = "Period of record: 1992-2018, Data source: Short, Karen (2021) & NWSFO Tallahassee") +
  theme_minimal() 

KBDI_Aprilper <- KBDI_Fires.sf %>%
  filter(month(DISCOVERY_) == 05 | month(DISCOVERY_) == 06 | month(DISCOVERY_) == 07) %>%
  group_by(DroughtIndex) %>%
  summarise(count = n(),
            percent = count/nrow(KBDI_Fires.sf))

plot2 <- ggplot(data = KBDI_Aprilper,
       mapping = aes(y = percent, 
                     x = DroughtIndex,
                     fill = percent)) +
  geom_col() +
  coord_flip() +
  scale_fill_distiller(palette = "Oranges",
                       direction = 1,
                       guide = "none") +
  scale_y_continuous(labels = scales::percent) +
  xlab("") + ylab("") +
  labs(title = "Fire Season") +
  #labs(title = "60% of Natural Fires Occur During Dry Conditions in the Fire Season",
       #subtitle = "KDBI = 7-6: Drought, 4-5: Dry, 1-3: Moist, 0: Saturated",
       #caption = "Period of record: May - July 1992-2018, Data source: Short, Karen (2021) & NWSFO Tallahassee") +
  theme_minimal()

(plot1 + plot2) +
  plot_annotation(title = "Natural Fires Peak During Dry Conditions",
       subtitle = "KDBI = 7-6: Drought, 4-5: Dry, 1-3: Moist, 0: Saturated",
       caption = "Period of record: 1992-2018, Data source: Short, Karen (2021) & NWSFO Tallahassee")
```

Explore trends in fire size by month/year

Fix Error: DISCOVERY_ not found
```{r}
anf_LF.sf %>%
  group_by(month(DISCOVERY_), year(DISCOVERY_)) %>%
  summarise(Avg = mean(FIRE_SIZE)) %>%
  ggplot(aes(x = year(DISCOVERY_), y = Avg, color = Avg)) +
  geom_smooth(method = lm, se = FALSE, color = "gray70") +
  geom_point() +  
  scale_color_gradientn(colors = terrain.colors(5), guide = "none") +
  #scale_y_continuous(limits = c(0, 201)) +
  scale_x_continuous(limits = c(1992, 2018)) + #, breaks = c(1950, 1980, 2010)) +
  ylab("") + xlab("") +
  facet_wrap(~ month(DISCOVERY_), ncol = 12) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 
  #labs(title = "Risk of wildfires is increasing in the Apalachicola National Forest.",
       #subtitle = "Monthly average soil moisture deficit (mm) by year with trend line (gray)", 
       #caption = "Period of record: 1992-2018, Data source: Short, Karen (2021)") 
```

Find monthly KBDI averages
```{r}
KBDI_Trends.df <- TLH.df %>%
  group_by(Year, month) %>%
  summarise(MonthlyKBDI = mean(Ql))
```

How do monthly KBDI trends vary Annually? Is there a drying trend with time?
April, May, July, August, September, and October all show significant drying trends from 1940 - 2020
```{r}
ggplot(data = KBDI_Trends.df,
       mapping = aes(x = Year, y = MonthlyKBDI)) +
  geom_point(aes(colour = MonthlyKBDI)) +
  scale_color_gradient(low = "orange", high = "red4") +
  geom_smooth(method = lm, se = FALSE, color = "gray70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  facet_wrap(~month, ncol = 12) +
  ylab("Monthly KBDI Average") +
  labs(title = "Monthly KBDI averages are increasing annually in the Appalachacola National Forest",
       caption = "Period of Record: Mar. 1940 - Apr. 2020, Source: NWSWFO Tallahassee")
```

Explore rainfall trends

Monthly rain totals are significantly decreasing in May/July, but increasing in June
```{r}
TLH.df %>%
  group_by(Year, month) %>%
  summarise(MonthlyTotal = sum(Rainfall24mm)) %>%
  ggplot(aes(x = Year, y = MonthlyTotal)) +
  geom_point(aes(colour = MonthlyTotal)) +
  scale_color_gradient(low = "skyblue2", high = "royalblue4") +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  facet_wrap(~month, ncol = 12) +
  ylab("Monthly Rainfall Total (mm)") +
  labs(title = "Monthly rainfall totals are significantly decreasing in May and July",
       subtitle = "The fire season runs May - July",
       caption = "Period of Record: Mar. 1940 - Apr. 2020, Source: NWSWFO Tallahassee")
```

Fire season rain fall trends have been decreasing annually.
```{r}
TLH.df %>%
  filter(month == "May"| month == "Jun"| month == "Jul") %>%
  group_by(Year) %>%
  summarise(FireSeasonRain = sum(Rainfall24mm)) %>%
  ggplot(aes(x = Year, y = FireSeasonRain)) +
  geom_point(aes(colour = FireSeasonRain)) +
  scale_color_gradient(low = "skyblue2", high = "royalblue4") +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  ylab("Fire Season Rain Total (mm)") +
  labs(title = "Rainfall During the Fire Season is Decreasing Annually",
       subtitle = "At the Tallahassee International Airport",
       caption = "Period of Record: Mar. 1940 - Apr. 2020, Source: NWSWFO Tallahassee")
```


Explore Temperature trends

The average monthly temperature is warming for every month but January
```{r}
TLH.df %>%
  group_by(Year, month) %>%
  summarise(TempAvg = mean(TMAX)) %>%
  ggplot(aes(x = Year, y = TempAvg)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  facet_wrap(~month, ncol = 12) +
  ylab("Average Monthly Temperature (Degrees F)") +
  labs(title = "The Average Monthly Temperature at the Tallahassee International Airport is Rising Annually",
       subtitle = "*Except for January",
       caption = "Period of Record: Mar. 1940 - Apr. 2020, Source: NWSWFO Tallahassee")
```

Further explore temperature, precip and KBDI trends over the fire season
```{r}
gg.SeasonTemp <- TLH.df %>%
  filter(month == "May" | month == "Jun" | month == "Jul") %>%
  group_by(Year, month) %>%
  summarise(TempAvg = mean(TMAX)) %>%
  ggplot(aes(x = Year, y = TempAvg)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  facet_wrap(~month, ncol = 3) +
  ylab("Average Monthly Temperature (Degrees F)") +
  labs(title = "Max Temperature Averages")

gg.SeasonRain <- TLH.df %>%
  filter(month == "May" | month == "Jun" | month == "Jul") %>%
  group_by(Year, month) %>%
  summarise(MonthlyRain = sum(Rainfall24mm)) %>%
  ggplot(aes(x = Year, y = MonthlyRain)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  facet_wrap(~month, ncol = 3) +
  ylab("Total Monthly Rainfall (mm)") +
  labs(title = "Annual Rainfall Totals")

gg.KBDIavg <- TLH.df %>%
  filter(month == "May" | month == "Jun" | month == "Jul") %>%
  group_by(Year, month) %>%
  summarise(MonthlyKBDI = mean(Ql)) %>%
  ggplot(aes(x = Year, y = MonthlyKBDI)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  facet_wrap(~month, ncol = 3) +
  ylab("Monthly KBDI Averages") +
  labs(title = "Annual Trends in Monthly KBDI Averages")

gg.KBDIavg/(gg.SeasonTemp + gg.SeasonRain) +
  plot_annotation(caption = "Period of Record: Mar. 1940 - Apr. 2020, Source: NWSWFO Tallahassee")
```

Explore across fire season without breaking down month. Data for May, June, July
```{r}
gg.SeasonTemp1 <- TLH.df %>%
  filter(month == "May" | month == "Jun" | month == "Jul") %>%
  group_by(Year, month) %>%
  summarise(TempAvg = mean(TMAX)) %>%
  ggplot(aes(x = Year, y = TempAvg)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  ylab("Average Monthly Temperature (Degrees F)") +
  labs(title = "Max Temperature Averages")

gg.SeasonRain1 <- TLH.df %>%
  filter(month == "May" | month == "Jun" | month == "Jul") %>%
  group_by(Year) %>%
  summarise(SeasonRain = sum(Rainfall24mm)) %>%
  ggplot(aes(x = Year, y = SeasonRain)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  ylab("Season Rainfall Totals (mm)") +
  labs(title = "Season Rainfall Totals")

gg.KBDIavg1 <- TLH.df %>%
  filter(month == "May" | month == "Jun" | month == "Jul") %>%
  group_by(Year, month) %>%
  summarise(MonthlyKBDI = mean(Ql)) %>%
  ggplot(aes(x = Year, y = MonthlyKBDI)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "grey70") +
  scale_x_continuous(breaks = c(1945, 1977, 2010)) +
  ylab("Monthly KBDI Averages") +
  labs(title = "Annual Trends in Monthly KBDI Averages")

gg.KBDIavg1/(gg.SeasonTemp1 + gg.SeasonRain1) +
  plot_annotation(caption = "Period of Record: Mar. 1940 - Apr. 2020, Source: NWSWFO Tallahassee")
```

###Explore relationships between April KBDI observations, FireSeason KBDI observations, and the number of FireSeason Fires

There is a low correlation (0.399) between the average KBDI in April and the Average KBDI of the proceeding fire season
```{r}
TLH.df%>%
  filter(month == "Apr"| month == "May"| month == "Jun"|month == "Jul") %>%
  filter(Year >= 1991 & Year <= 2019) %>%
  group_by(Year) %>%
  summarise(AprilKBDIavg = mean(Ql[month == "Apr"]), SummerKBDIavg = mean(Ql[month == "May"| month == "Jun"| month == "Jul"])) %>%
  summarise(correlation = cor(AprilKBDIavg, SummerKBDIavg))
```

How would the correlation compare if just the last day of April's KBDI was compared to the summer average KBDI?
Comparing the just the last day of April results in a higher correlation coefficient of 0.484.
```{r}
TLH.df %>% 
  filter(month == "Apr"| month == "May"| month == "Jun"|month == "Jul") %>%
  filter(Year >= 1991 & Year <= 2019) %>%
  group_by(Year) %>%
  summarise(AprilLastDay = Ql[month == "Apr" & day(Date) == 30], SummerKBDIavg = mean(Ql[month == "May"| month == "Jun"| month == "Jul"])) %>%
  summarise(correlation = cor(AprilLastDay, SummerKBDIavg))
```

1991 - 2020 April Normal:
MeanMaxTemp: 80.2
MeanPrecip: 3.53 inches
MeanPrecip Jan-April 17.46 inches
https://climatecenter.fsu.edu/products-services/data/1991-2020-normals/tallahassee 

Notable years:
April of 2002 was significantly warm (+4.4 degrees above normal) and dry (-3.04 inches below normal)
April of 2011 was also significantly warm (+4.43 degrees above normal) and dry (-1.42 inches below normal)

What is the correlation when comparing precip and temp deficits to fire season KBDI? (-0.312) 
This is lower than using KBDI values.
```{r}
TLH.df %>%
  filter(month == "Apr" | month == "May" | month == "Jun" | month == "Jul") %>%
  filter(Year >= 1991 & Year < 2020) %>%
  group_by(Year) %>%
  summarise(AprTempAvg = mean(TMAX[month == "Apr"]), AprPrecip = sum(Rainfall24[month == "Apr"]), AprTempAnom = mean(TMAX[month == "Apr"])-80.2, AprPrecipAnom = sum(Rainfall24[month == "Apr"])-3.53, SeasonAvgKBDI = mean(Ql[month == "May" | month == "Jun" | month == "Jul"])) %>%
  summarise(PrecipAnomKBDI = cor(AprPrecipAnom, SeasonAvgKBDI), TempAnomKBDI = cor(AprTempAnom, SeasonAvgKBDI))
```

April Precip Anomaly is negatively correlated to May Average KBDI (-.669). The further below the average rainfall total, the higher likelihood of a high May KBDI average.
April Temp Anomaly is positively correlated to May Average KBDI (+0.193). This is a low correlation coefficient and departure from the average April temperature does not appear as significant as the departure from the average April rainfall.
```{r}
TLH.df %>%
  filter(month == "Apr" | month == "May") %>%
  filter(Year >= 1991 & Year < 2020) %>%
  group_by(Year) %>%
  summarise(AprTempAvg = mean(TMAX[month == "Apr"]), AprPrecip = sum(Rainfall24[month == "Apr"]), AprTempAnom = mean(TMAX[month == "Apr"])-80.2, AprPrecipAnom = sum(Rainfall24[month == "Apr"])-3.53, MayAvgKBDI = mean(Ql[month == "May"])) %>%
  summarise(PrecipAnomKBDI = cor(AprPrecipAnom, MayAvgKBDI), TempAnomKBDI = cor(AprTempAnom, MayAvgKBDI))
```

Does the trend hold if a further rain deficit is explored. Explore what summer KBDI values might be based on rainfall anomalies between Jan and April
Exploring a longer period of rainfall deficit, a correlation coefficient of -0.558 is found. This is lower than comparing the rainfall deficit across just April which was -0.669
```{r}
TLH.df %>%
  filter(month == "Jan"| month == "Feb" | month == "Mar"| month == "Apr" | month == "May") %>%
  filter(Year >= 1991 & Year <= 2019) %>%
  group_by(Year) %>%
  summarise(PrecipAnom = sum(Rainfall24[month == "Jan"| month == "Feb"| month == "Mar"| month == "Apr"])-17.46, MayKBDI = mean(Ql[month == "May"])) %>%
  summarise(correlation = cor(PrecipAnom,MayKBDI))
```

Explore KBDI by season
Create stat columns for KBDI for each season
```{r}
seasonKBDI <- TLH.df %>%
  filter(year(Date) >= 1992) %>%
  filter(year(Date) <= 2018) %>%
  filter(month(Date) == 05 | month(Date) == 06 | month(Date) == 07) %>%
  mutate(Season = year(Date)) %>%
  group_by(Season) %>%
  summarise(Season_Avg = mean(Ql), Season_Median = median(Ql), Season_Min = min(Ql), Season_Max = max(Ql))
```

Merge KBDI and Fire data sets. 
```{r}
seasonStats <- left_join(x = seasonFires, y = seasonKBDI, by = c("Year" = "Season")) %>%
  rename(Season = Year)
```

There are high correlation values between the number of fires that occur during the fire season and KBDI values during that season
```{r}
cor(seasonStats$nFIRES, seasonStats$Season_Avg)
cor(seasonStats$nFIRES, seasonStats$Season_Median)
cor(seasonStats$nFIRES, seasonStats$Season_Max)
```

When running the liner regression models, April KBDI values will be used to predict the season. Explore stats for April and compare to the number of fires during the season. Also include the KBDI on the last day of April for each season
```{r}
AprilKBDI <- TLH.df %>%
  filter(year(Date) >= 1992) %>%
  filter(year(Date) <= 2018) %>%
  filter(month(Date) == 04) %>%
  mutate(Season = year(Date)) %>%
  group_by(Season) %>%
  summarise(April_Avg = mean(Ql), April_Median = median(Ql), April_Min = min(Ql), April_Max = max(Ql), KBDI_LastDay = Ql[month(Date) == 04 & day(Date) == 30], KBDI_midApril = Ql[month(Date) == 04 & day(Date) == 15])
```

Merge fire season fire counts with the April KBDI summaries 
```{r}
AprilStats <- left_join(x = AprilKBDI, y = seasonFires, by = c("Season" = "Year")) %>%
  rename(SeasonFires = nFIRES) %>%
  replace_na(list(SeasonFires = 0))
```

When exploring the relationship between April KBDI values and the number of fires that occurred in the proceeding season a correlation coefficient of about 0.55 is found, with the best response to the KBDI median and KBDI mean.
```{r}
cor(AprilStats$SeasonFires, AprilStats$April_Avg)
cor(AprilStats$SeasonFires, AprilStats$April_Median)
cor(AprilStats$SeasonFires, AprilStats$April_Max)
cor(AprilStats$SeasonFires, AprilStats$KBDI_LastDay)
cor(AprilStats$SeasonFires, AprilStats$KBDI_midApril)
```

Predict each past season fire count using a glm and compare to what the count actually was. 

Questions to consider while working?
Can't use binomial family when using AprilStats data set. Use poison because SeasonFires is count data.
Could the min KBDI also have a significant influence. Especially if it resets to zero.
Should net rainfall be included in the model for each season?
Use predict function to predict the number of April fires and compare to the actual amount.

A summary of April GLM shows that the Average April KBDI is statistically significant.
```{r}
AprilGLM <- glm(formula = SeasonFires ~ April_Avg, family = poisson, data = AprilStats)

pred_AprilGLM <- predict(object = AprilGLM,
                         type = "response",
                         se.fit = TRUE)

#summary(AprilGLM)

lowerAprilStats <- pred_AprilGLM$fit - (1.96*pred_AprilGLM$se.fit)
upperAprilStats <- pred_AprilGLM$fit + (1.96*pred_AprilGLM$se.fit)

ggplot(mapping = aes(x = AprilStats$April_Avg, y = pred_AprilGLM$fit)) +
  geom_ribbon(aes(ymin = lowerAprilStats, ymax = upperAprilStats), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Number of Fires") +
  xlab("April Average KBDI") +
  labs(
    title = "Predicted Number of Fires for the Upcoming Fire Season\nBased on April Averaged KBDI",
    subtitle = "With 95% Confidence Interval",
    caption = "April 1992-2018")
```

create a column of predicted number of fires for each season and compare
```{r}
AprilStats$PredFires1 <- pred_AprilGLM$fit
AprilStats$Compare1 <- AprilStats$SeasonFires - AprilStats$PredFires1
cor(AprilStats$SeasonFires, AprilStats$PredFires1)
```

Whats the relationship between April KBDI and the accuracy of the predicted fire count. Use the absolute value to demonstrate how far the predicted value is off of the actual value. This correlation returns 0.693. How is this interpreted/ how should the relationship be summarized?
```{r}
cor(AprilStats$April_Avg, abs(AprilStats$Compare1))
```

What other factors could influence the number of fires:
April Max KBDI?
April Min KBDI?
April Net Rainfall?
April KBDI Range?
Number of days KBDI > 400?

We find that the April KBDI Average is not as statistically significant as our first model.
The correlation is slightly stronger than the model that was run without max and min KBDI
```{r}
April2GLM <- glm(formula = SeasonFires ~ April_Avg + April_Min + April_Max, family = poisson, data = AprilStats)
summary(April2GLM)
pred_April2GLM <- predict(object = April2GLM,
                         type = "response",
                         se.fit = TRUE)

lowerAprilStats <- pred_April2GLM$fit - (1.96*pred_April2GLM$se.fit)
upperAprilStats <- pred_April2GLM$fit + (1.96*pred_April2GLM$se.fit)

ggplot(mapping = aes(x = AprilStats$April_Avg, y = pred_April2GLM$fit)) +
  geom_ribbon(aes(ymin = lowerAprilStats, ymax = upperAprilStats), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Number of Fires") +
  xlab("April Average KBDI") +
  labs(
    title = "Predicted Number of Fires for the Upcoming Fire Season\nBased on April Averaged KBDI\nAccounting For April Min & Max KBDI Values",
    subtitle = "With 95% Confidence Interval",
    caption = "April 1992-2018")

#Correlation when including April max and min KBDI values
cor(AprilStats$SeasonFires, pred_April2GLM$fit)
#Correlation without including April max and min KBDI values
cor(AprilStats$SeasonFires, AprilStats$PredFires1)
```

Create columns to further compare the actual number of fires that occurred in a season and the predicted number of fires.
```{r}
AprilStats$PredFires2 <- pred_April2GLM$fit
AprilStats$Compare2 <- AprilStats$SeasonFires - AprilStats$PredFires2
```

###Ideas/ adjustments from meeting on 8/23/2021

What other factors may influence the occurrence of fires during the fire season outside of soil dryness and el nino?
Could the number of fires over the previous year influence the number of fires in the upcoming season? Explore all fires including those categorized outside of natural fires. How might the area burned over the last year be relevant?

Create fire data set for all fires in the ANF including human caused.
```{r}
allFires.sf <- st_read(dsn = "Data/updatedFireData/Data/FPA_FOD_20210617.gpkg", layer = "Fires") %>%
  filter(STATE == "FL") %>%
  st_transform(crs = st_crs(anfbounds.sf)) %>%
  st_intersection(anfbounds.sf) #%>%
  #select(FOD_ID, FIRE_NAME, FIRE_YEAR, DISCOVERY_DATE, NWCG_CAUSE_CLASSIFICATION, NWCG_GENERAL_CAUSE, FIRE_SIZE, FIRE_SIZE_CLASS, LATITUDE, LONGITUDE, FIPS_NAME, FIPS_CODE, Shape)
```

Select columns in separate line so it does not take as long to load.
```{r}
allFires.sf <- allFires.sf %>%
select(FOD_ID, FIRE_NAME, FIRE_YEAR, DISCOVERY_DATE, NWCG_CAUSE_CLASSIFICATION, NWCG_GENERAL_CAUSE, FIRE_SIZE, FIRE_SIZE_CLASS, LATITUDE, LONGITUDE, FIPS_NAME, FIPS_CODE, Shape)
```

Change date column format
```{r}
allFires.sf$DISCOVERY_DATE <- as.Date(allFires.sf$DISCOVERY_DATE)

allFires.sf <- allFires.sf %>%
  rename(DISCOVERY_ = DISCOVERY_DATE)
```

Filter for lightning fires. Previous dataframes of lightning fires were filtered to anf counties and not the anf bounds.
```{r}
ANF_LFburned <- allFires.sf %>%
  filter(NWCG_CAUSE_CLASSIFICATION == "Natural")
```

group fires/ year and sum up the total acres burned during each month
```{r}
testgroup <- allFires.sf %>%
  group_by(FIRE_YEAR, month(DISCOVERY_)) %>%
  summarise(monthAcers = sum(FIRE_SIZE)) %>%
  rename(month = "month(DISCOVERY_)")
```

##Burned area code has been edited/ corrected, see SEDAGG rmd for corrections

Follow Guidance from Dr. Elsner's email.
Take this code chunk out?
```{r}
allFires.sf <- allFires.sf %>%
  mutate(InSeason = DISCOVERY_ >= as.Date(paste0(as.character(FIRE_YEAR), "-04", "-30")) & DISCOVERY_ <= as.Date(paste0(as.character(FIRE_YEAR+1), "-03", "-31"))) %>%
  #rearrange order of columns
  select(FOD_ID:DISCOVERY_, InSeason, NWCG_CAUSE_CLASSIFICATION:Shape)
```

Rearranging to get string of date range rather than true/false
```{r}
allFires.sf <- allFires.sf %>%
  mutate(burnedGroupStart = as.Date(paste0(as.character(FIRE_YEAR), "-04", "-1"))) %>%
  mutate(burnedGroupEnd = as.Date(paste0(as.character(FIRE_YEAR+1), "-03", "-31"))) %>%
  unite(col = "burnedGroupInterval", c("burnedGroupStart", "burnedGroupEnd"), sep = " -- ") %>%
  #rearrange order of columns
  select(FOD_ID:DISCOVERY_, burnedGroupInterval, NWCG_CAUSE_CLASSIFICATION:Shape)
```

Group data by burned periods and sum the total acres burned over that period
Note this does not include fires in the first 3 months of the data set
```{r}
burnedAcres <- allFires.sf %>%
  group_by(burnedGroupInterval) %>%
  summarise(acresBurned = sum(FIRE_SIZE)) %>%
  #create column to define the year the data will be used to forecast for
  mutate(forecastYear = substr(burnedGroupInterval, 15, 18)) %>%
  #reorder columns
  select(forecastYear, burnedGroupInterval:acresBurned) %>%
  #convert forecastYear to an int/ double so it will merge
  transform(forecastYear = as.numeric(forecastYear))
```

Create a count of natural wildfires occurring in the anf during the fire season. This will be merged with the burned acres data frame.
```{r}
ANF_LFburned_count <- ANF_LFburned %>%
  filter(month(DISCOVERY_) == 05 | month(DISCOVERY_) == 06 | month(DISCOVERY_) == 07) %>%
  count(FIRE_YEAR) %>%
  rename(Year = FIRE_YEAR, nFIRES = n) %>%
  st_set_geometry(NULL)
```

merge burned data frame with the number of lightning wildfires that occurred in the Apalachicola National Forest for each year.
```{r}
countfiresandburned.df <- left_join(x = burnedAcres, y = ANF_LFburned_count, by = c("forecastYear" = "Year"))

countfiresandburned.df$nFIRES <- replace_na(countfiresandburned.df$nFIRES, 0)
```

Explore how the number of acres burned over the past year (April - March) may influence the number of lightning wildfires that occur.
Follow linear regression model from modeling April KBDI

The model finds that the number of acres burned over the past year is not statistically significant
There is minimal evidence of a relationship between the number of acres burned and the number of natural fires that occur during the fire season.
p-value = 0.0604
correlation coefficient = 0.0966; we would expect a negative slope and a negative correlation.
```{r}
burnedGLM <- glm(formula = nFIRES ~ acresBurned, family = poisson, data = countfiresandburned.df)

pred_burnedGLM <- predict(object = burnedGLM,
                         type = "response",
                         se.fit = TRUE)

summary(burnedGLM)

cor(countfiresandburned.df$acresBurned, countfiresandburned.df$nFIRES)
```

Rather than comparing the area burned to the number of fires that occurred, compare the residuals between the model residuals.
Join data sets so data can be visualized better
```{r}
tocompareModels <- left_join(countfiresandburned.df, AprilStats, by = c("forecastYear" = "Season")) %>%
  filter(forecastYear <= 2018) %>%
  select(forecastYear:nFIRES, April_Min:KBDI_midApril)
```


First, build a model to represent the predicted number of fires based on the KBDI on the 15th of April. The previous models run to this point were focused on average KBDI. We do not want to explore KBDI as an average and instead will represent one day of KBDI.

```{r}
April15GLM <- glm(formula = nFIRES ~ KBDI_midApril, family = poisson, data = tocompareModels)

pred_April15GLM <- predict(object = April15GLM,
                         type = "response",
                         se.fit = TRUE)

#summary(April15GLM)

lowerAprilStats <- pred_April15GLM$fit - (1.96*pred_April15GLM$se.fit)
upperAprilStats <- pred_April15GLM$fit + (1.96*pred_April15GLM$se.fit)

ggplot(mapping = aes(x = tocompareModels$KBDI_midApril, y = pred_April15GLM$fit)) +
  geom_ribbon(aes(ymin = lowerAprilStats, ymax = upperAprilStats), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Number of Fires") +
  xlab("April 15th KBDI") +
  labs(
    title = "Predicted Number of Fires for the Upcoming Fire Season\nBased on April 15th KBDI",
    subtitle = "With 95% Confidence Interval",
    caption = "April 1992-2018")
```

Correlate the residuals of this April KBDI model with the number of acres burned in the year prior.
When correlating the number of acres burned to the KBDI model, we find no relationship. Correlation coefficient = -0.0197
```{r}
cor(tocompareModels$acresBurned,resid(April15GLM))
```

Explore area as a percentage of the forest that was burned.
```{r}
st_area(anfbounds.sf) #2564052729 m^2
drop_units(st_area(anfbounds.sf))/1000000 #2564.053 kilometers^2
drop_units(st_area(anfbounds.sf))/4047 #approximately 633,568.7 acres
```

Create a column within the tocompareModels data frame to represent the percentage of the forest burned prior to the fire season
```{r}
tocompareModels <- tocompareModels %>%
  mutate(percentBurned = (acresBurned/(drop_units(st_area(anfbounds.sf))/4047))*100)
```

create model based on the percent of forest that was burned.
We find the percent of forest burned is not statiscally significant (p-value = 0.146)
The correlation coefficient suggests no realtionship (-0.0773)
```{r}
percentBurnedGLM <- glm(formula = nFIRES ~ percentBurned, family = poisson, data = tocompareModels)

pred_percentBurnedGLM <- predict(object = percentBurnedGLM,
                         type = "response",
                         se.fit = TRUE)

summary(percentBurnedGLM)

cor(tocompareModels$percentBurned, tocompareModels$nFIRES)
```

Correlate the residuals of the April KBDI model with the number of acres burned in the year prior
Suggests no relationship (-0.0197)
```{r}
cor(tocompareModels$percentBurned,resid(April15GLM))
```


Build a model to predict the number of fires based on burned area
```{r}
burnedareaGLM <- glm(formula = nFIRES ~ acresBurned, family = poisson, data = tocompareModels)

pred_burnedareaGLM <- predict(object = burnedareaGLM,
                              type = "response",
                              se.fit = TRUE)

lowerBurnedArea <- burnedareaGLM$fit - (1.96*pred_burnedareaGLM$se.fit)
upperBurnedArea <- burnedareaGLM$fit + (1.96*pred_burnedareaGLM$se.fit)

ggplot(mapping = aes(x = tocompareModels$acresBurned, y = pred_burnedareaGLM$fit)) +
  geom_ribbon(aes(ymin = lowerBurnedArea, ymax = upperBurnedArea), fill = "grey") +
  geom_line(color = "blue") +
  ylab("Predicited Number of Fires") +
  xlab("Burned Acres") +
  labs(
    title = "Predicted Number of Fires for the Upcoming Fire Season\nBased the Area Burned Over the Past Year",
    subtitle = "With 95% Confidence Interval",
    caption = "1993-2018")

#summary(burnedareaGLM)
```

Explore if there is a correlation between the model residuals from the two models.
```{r}
cor(resid(April15GLM),resid(burnedareaGLM))
#cor(se.fit(pred_April15GLM),se.fit(pred_burnedareaGLM))
```

Create a model for both KBDI and Acres burned
In model summary, acresBurned is not reflected as statistically significant. correlation of residuals is 0.839. Should acresBurned be included in the model?
```{r}
draftModelGLM <- glm(formula = nFIRES ~ KBDI_midApril + acresBurned, family = poisson, data = tocompareModels)

summary(draftModelGLM)

pred_draftModelGLM <- predict(object = draftModelGLM,
                              type = "response",
                              se.fit = TRUE)
```

Compare residuals from each model
```{r}
compareResid <- tocompareModels %>%
  select(forecastYear, nFIRES) %>%
  st_set_geometry(NULL) %>%
  mutate(KBDIresid = resid(April15GLM)) %>%
  mutate(KBDIpred = pred_April15GLM$fit) %>%
  mutate(BurnedResid = resid(burnedareaGLM)) %>%
  mutate(Burnedpred = pred_burnedareaGLM$fit) %>%
  mutate(combinedResid = resid(draftModelGLM)) %>%
  mutate(combinepred = pred_draftModelGLM$fit)
```

Residual Correlation
cor of acres burned and combined model = 0.835
cor of KBDI and combined model = 0.998
```{r}
cor(resid(burnedareaGLM),resid(draftModelGLM))
cor(resid(April15GLM),resid(draftModelGLM))
```


Explore if the number of total fires (not just natural) in the months prior to the lightning season is related to the number during the season. Question: If there are a lot of fires in prior to the season, does that imply that there is abundant fuel for the season?

find number of fires between Jan and March.
```{r}
earlyFires <- allFires.sf %>%
  group_by(FIRE_YEAR, month(DISCOVERY_)) %>%
  count(FIRE_YEAR) %>%
  rename(month = 'month(DISCOVERY_)') %>%
  group_by(FIRE_YEAR) %>%
  summarise(earlyFires = sum(n[month == "1" | month == "2" | month == "3"])) 
```

merge the fire count data set with "seasonFires"
```{r}
comparePre.df <- left_join(x = earlyFires, y = seasonFires, by = c("FIRE_YEAR" = "Year")) %>%
  rename(seasonFires = nFIRES, Year = FIRE_YEAR, earlyFiresShape = Shape) %>%
  select(Year, earlyFires, seasonFires)

comparePre.df$seasonFires <- replace_na(comparePre.df$seasonFires, 0)
```

There is not enough evidence to suggest a relationship between the number of fires that occurred in the months prior to the fire season and the fires that occured during the fire season. Correlation Coefficient = -0.280
```{r}
cor(comparePre.df$earlyFires, comparePre.df$seasonFires)
```


--Scratch code

find a way to create a int and update i then convert that date to a string. Create a variable to represent that sting and plug into as.interval
```{r}
period <- period(years = 1)

i <- 1991
x <- 1

for(i in 1991:2019) {
toString(i)
burnedGroups[x] <- as.interval(period, ymd(paste(i, "04-01", sep="-")))

strtoi("i")

i <- i+1
x <- x+1
}

burnPeriods <- data.frame(burnedGroups)
```


###El Nino
link with el nino explaination to reference
https://www.climate.gov/news-features/understanding-climate/el-ni%C3%B1o-and-la-ni%C3%B1a-frequently-asked-questions

Import el nino region weekly data
https://www.cgd.ucar.edu/cas/catalog/climind/
```{r}
elNinoImport.df <- read.csv('https://www.cpc.ncep.noaa.gov/data/indices/ersst4.nino.mth.81-10.ascii',
                 header = TRUE, sep = "")

elNinoMarch <- elNinoImport.df %>%
  filter(MON == "3")
```

```{r}
elNinoMarch <- left_join(x = elNinoMarch, y = seasonFires, by = c("YR" = "Year"))

elNinoMarch <- elNinoMarch %>%
  filter(YR >= 1992 & YR <= 2018) 
elNinoMarch$nFIRES <- replace_na(elNinoMarch$nFIRES, 0)

cor(elNinoMarch$ANOM.3, elNinoMarch$nFIRES) #region3.4 = -0.148
cor(elNinoMarch$ANOM, elNinoMarch$nFIRES) #region1.2 = -0.0176
cor(elNinoMarch$ANOM.1, elNinoMarch$nFIRES) #region3 = -0.053
cor(elNinoMarch$ANOM.2, elNinoMarch$nFIRES) #region4 = -0.284; has the strongest correlation, but does not show a relationship
```


3 month grouped el nino data: https://catalog.data.gov/dataset/climate-prediction-center-cpcoceanic-nino-index
```{r}
groupedElNino.df <- read.csv('https://www.cpc.ncep.noaa.gov/data/indices/oni.ascii.txt',
                             header = TRUE, sep = "")

groupedJFM <- groupedElNino.df %>%
  filter(SEAS == "JFM")
```

compare number of fires in a season with the JFM anomalies
```{r}
elNinoFireCount <- left_join(x = groupedJFM, y = seasonFires, by = c("YR" = "Year"))

elNinoFireCount <- elNinoFireCount %>%
  filter(YR >= 1992 & YR <= 2018) 
elNinoFireCount$nFIRES <- replace_na(elNinoFireCount$nFIRES, 0)

cor(elNinoFireCount$ANOM, elNinoFireCount$nFIRES) #-0.112: No relationship
```

Trend gets stronger closer to the fire season:
when filtering SEAS == DJF, cor = -0.090: No relationship
''' SEAS == JFM, cor = -0.112: no relationship
''' SEAS == FMA, cor = -0.157: No relationship
''' SEAS == MAM, cor = -0.235: No relationship
''' SEAS == AMJ, cor = -0.336: No relationship
''' SEAS == MJJ, cor = -0.428: No relationship
''' SEAS == JJA, cor = -0.464: No relationship


NEXT STEPS
run correlations between El Nino Values and April KBDI. This project is a two step approach. 1) how does climate influence drought? 2) how does drought influence wildfires?

create a data set that holds both monthly el nino data and April KBDI data
AprilKBDI = KBDI values in April
elNinoMarch = El Nino anomalies for the month of march for every nino region & number of fires in the season
```{r}
elNinoDrought <- left_join(x = AprilKBDI, y = elNinoMarch, by = c("Season" = "YR")) %>%
  filter(Season <= 2017)
```

Explore drought/ el Nino relationships
```{r}
cor(elNinoDrought$KBDI_midApril, elNinoDrought$ANOM.3) #region 3.4, cor = -.265
cor(elNinoDrought$KBDI_midApril, elNinoDrought$ANOM.2) #region 4, cor = -.324
cor(elNinoDrought$KBDI_midApril, elNinoDrought$ANOM.1) #region 3, cor = -.124
cor(elNinoDrought$KBDI_midApril, elNinoDrought$ANOM) #region 1.2, cor = 0.069
```

Explore relationships with 3 month means
AprilKBDI = KBDI values in April
groupedJFM = grouped El Nino means January, Febuary, and March
```{r}
groupedElNinoDrought <- left_join(x = AprilKBDI, y = groupedJFM, by = c("Season" = "YR"))
```

Explore Feb, Mar, April grouping
```{r}
groupedFMA <- groupedElNino.df %>%
  filter(SEAS == "FMA")

FMAelNinoDrought <- left_join(x = AprilKBDI, y = groupedFMA, by = c("Season" = "YR"))
```


```{r}
cor(groupedElNinoDrought$KBDI_midApril, groupedElNinoDrought$ANOM) #cor = -.224
cor(FMAelNinoDrought$KBDI_midApril, FMAelNinoDrought$ANOM) #cor = -.226
```

Harrison (2004) discusses that there may be a lag in the relationship between el nino and drought by nearly a year.

Perform bind_cols to set data off by one year
First, set two data frames.
AprilKBDI will be filtered down to 2016 b/c elNino Import.df does not have all months of 2017 and it is offset by one year.
elNinoImport.df will need to be filtered from 1991 - 2016 and by April
```{r}
bind1 <- AprilKBDI %>%
  filter(Season <= 2017)
  

bind2 <- elNinoImport.df %>%
  filter(YR >= 1991 & YR <= 2016 & MON == 8) %>%
  rename(NinoYear = "YR")

#bind columns
offsetYear <- bind_cols(bind1, bind2)

#run correlation when el nino anomaly is lagged by one year from aprilKBDI
cor(offsetYear$KBDI_midApril, offsetYear$ANOM.3)
```
FOr region 3.4
April Offset = -.007: no relationship
May offset = -.083
June offset = -.223
July offset = -.340
Aug Offset = -.356
Sep Offset = -.332
Oct offset = -0.295
Nov offset = -.301
Dec offset = -.298


Explore months beyond April
Use each months max KBDI and correlate with that months sst anomaly
```{r}
monthlyMaxKBDI <- TLH.df %>%
  group_by(Year, month) %>%
  summarise(maxKBDI = max(Ql))
```

Clean up data to perform cbind
filter both to the start of the fire data set - year = 1992
monthlyMaxKBDI must be limited to the extent of the el nino data. End both datasets in 2016
```{r}
monthlyMaxKBDI <- monthlyMaxKBDI %>%
  filter(Year >= 1992 & Year <= 2016)

elNinoImport_filter <- elNinoImport.df %>%
  filter(YR >= 1992 & YR <= 2016)

#bind datasets
MaxKBDIElNino <- bind_cols(monthlyMaxKBDI, elNinoImport_filter)
```

Explore correlation between Monthly Max KBDI and Monthly El Nino anomaly
No suggestion of a relationship between a months max KBDI and the monthly sst anomaly
```{r}
cor(MaxKBDIElNino$maxKBDI,MaxKBDIElNino$ANOM.3) #region 3.4 cor = -.213
cor(MaxKBDIElNino$maxKBDI,MaxKBDIElNino$ANOM.2) #region 4 cor = -.236
cor(MaxKBDIElNino$maxKBDI,MaxKBDIElNino$ANOM.1) #region 3 cor = -.164
cor(MaxKBDIElNino$maxKBDI,MaxKBDIElNino$ANOM) #region 1.2 cor = -.075
```

It is suggested that ENSO peaks in the late winter month/ early spring months (Jan - Mar)
Filter dataset to these months and explore correlations
```{r}
jfm_KBDIelNino <- MaxKBDIElNino %>%
  filter(month == "Jan" | month == "Feb" | month == "Mar")
```

correlations suggest a stronger relationship between a months max KBDI and its monthly sst anomaly during the winter months.
```{r}
cor(jfm_KBDIelNino$maxKBDI, jfm_KBDIelNino$ANOM.3) #region 3.4 cor = -.484
cor(jfm_KBDIelNino$maxKBDI, jfm_KBDIelNino$ANOM.2) #region 4 cor = -0.523
cor(jfm_KBDIelNino$maxKBDI, jfm_KBDIelNino$ANOM.1) #region 3 cor = -.463
cor(jfm_KBDIelNino$maxKBDI, jfm_KBDIelNino$ANOM) #region 1.2 cor = -.382
```


Code copy/pasted from ANF_Fires.rmd for later use
Import El Nino Data
```{r}
SOI.df <- read.csv(file = "https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/data.csv",
                   skip = 1, header = TRUE) %>%
  mutate(Date = parse_date_time(as.character(Date), "ym"),
         Year = year(Date),
         Month = month(Date),
         month = month(Date, label = TRUE, abbr = TRUE),
         SOI = Value) %>%
  dplyr::select(Year, Month, month, SOI)
```


Based on analysis there is no indication of a relationship between el Nino and April KBDI values in the Apalachicola national forest
Explore the relationship between April KBDI and the North Atlantic Oscillation (NAO)


###Explore Predicition of Lightning
What is the correlation between ENSO and the count of lightning strikes during the fire season?

manipulate "countlightning.df" to group counts by year and month
```{r}
groupedlightning <- countlightning.df %>%
  group_by(year(SEQDAY), month(SEQDAY)) %>%
  summarize(sum(LightningCount)) %>%
  rename(Year = "year(SEQDAY)", Month = "month(SEQDAY)", LightningStrikes = "sum(LightningCount)")

fireseasonLightning <- groupedlightning %>%
  filter(Month == 5 | Month == 6 | Month == 7) %>%
  filter(Year <= 2012) %>%
  group_by(Year) %>%
  summarize(sum(LightningStrikes)) %>%
  rename(FireSeason = "Year", LightningStrikes = "sum(LightningStrikes)")
```

filter "groupedElNino.df" for comparison with the created "fireseasonLightning" data frame
If making fire prediction in April, soonest el nino values will be grouped on JFM
```{r}
NinoPreseason <- groupedElNino.df %>%
  filter(SEAS == "JFM") %>%
  #filter(SEAS == "FMA") %>%
  #filter(SEAS == "MAM") %>%
  filter(YR >= 1986 & YR <= 2012)
```

correlation between fire season lightning strikes and El Nino Anomaly = -0.110
The correlation does not change much if it is centered on FMA (-0.118) or MAM (-0.114)
```{r}
cor(fireseasonLightning$LightningStrikes, NinoPreseason$ANOM)
```


https://psl.noaa.gov/mjo/mjoindex/ 
Explore this link for mjo data.


###Explore updated fire data set compared to the old data set 

Filters have changed between the new data set, and fires are not defined by cause = lightning. Natural is introduced as a new classification. Are lightning fires the equivalent of natural fires?

Per meta data (copy and pasted):
NWCG_CAUSE_CLASSIFICATION = Broad classification of the reason the fire occurred (Human, Natural, Missing data/not specified/undetermined).

NWCG_GENERAL_CAUSE = Event or circumstance that started a fire or set the stage for its occurrence (Arson/incendiarism, Debris and open burning, Equipment and vehicle use, Firearms and explosives use, Fireworks, Misuse of fire by a minor, Natural, Power generation/transmission/distribution, Railroad operations and maintenance, Recreation and ceremony, Smoking, Other causes, Missing data/not specified/undetermined).


Create est simple features for old data
```{r}
if(!"FL_Fires" %in% list.files()){
download.file("http://myweb.fsu.edu/jelsner/temp/data/FL_Fires.zip",
"FL_Fires.zip")
unzip("FL_Fires.zip")
}
FL_FiresTest1.sf <- st_read(dsn = "FL_Fires") %>%
st_transform(crs = 3086)

#filtered fires within ANF bounds
#anf_fires.sf <- st_join(FL_Fires.sf, anfbounds.sf, join = st_within) %>%
  #filter(NFSLANDU_2 == 'Apalachicola National Forest')

#filtered for county bounds of ANF. This is to match the bounds of the lightning dataset
anf_firesTest1.sf <- st_join(FL_FiresTest1.sf, ANFcountyBounds.sf, join = st_within) %>%
  filter(CNTY_FIPS != "NA") %>%
  select(FOD_I, FIRE_N, FIRE_Y, DISCOVERY_, STAT_CAU_1, FIRE_SIZE, FIRE_SIZE_, LATIT, LONGI, NAME, FIPS, geometry)

anf_LFTest1.sf <- anf_firesTest1.sf %>%
  filter(STAT_CAU_1 == "Lightning") %>%
  #drop geometry so data sets can merge
  st_drop_geometry()
```

Create test simple features for new data
```{r}
updatedFiresTest2.sf <- st_read(dsn = "C:/Users/zlaw9/OneDrive/GITHUB/KDBI code/updatedFires",
                           layer = "FLFiresUpdated") %>%
  st_transform(crs = 3086)

#filtered for county bounds of ANF. This is to match the bounds of the lightning dataset
anf_firesTest2.sf <- st_join(updatedFiresTest2.sf, ANFcountyBounds.sf, join = st_within) %>%
  filter(CNTY_FIPS != "NA") %>%
  select(FOD_ID, FIRE_NAME, FIRE_YEAR, DISCOVERY_, NWCG_CAUSE, NWCG_GENER, FIRE_SIZE, FIRE_SIZE_, LATITUDE, LONGITUDE, COUNTY, FIPS_CODE, geometry)

anf_LFTest2.sf <- anf_firesTest2.sf %>%
  filter(NWCG_GENER == "Natural")
```

Join two data sets together to do bolean comparison. Boolean column is all TRUE. Therefore we can conclude that fires filtered as natural are the equivalent of fires filtered as Lightning in the 2015 data set.
```{r}
TestFires.sf <- left_join(x = anf_LFTest2.sf, y = anf_LFTest1.sf, by = c("FOD_ID" = "FOD_I")) %>%
  select(FOD_ID, FIRE_NAME, FIRE_N, FIRE_YEAR, FIRE_Y, NWCG_GENER, STAT_CAU_1) %>%
  rename(NAME2018 = FIRE_NAME, NAME2015 = FIRE_N, YEAR2018 = FIRE_YEAR, YEAR2015 = FIRE_Y, CAUSE2018 = NWCG_GENER, CAUSE2015 = STAT_CAU_1)

#create bool argeument to test if Lightining Fires are equivalent to Natural Fires
TestFires.sf$BOOL <- TestFires.sf$CAUSE2018 == "Natural" & TestFires.sf$CAUSE2015 == "Lightning"
```

Count number of new data points added in the 2018 set. There are 30 new data points.
```{r}
sum(is.na(TestFires.sf$YEAR2015))
```

###Code to run anf_fires
When referencing anf_fires file, data sources are in different drives. Run these code chunks to find the data
```{r}
if(!"S_USA.NFSLandUnit" %in% list.files()){
  download.file("https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.NFSLandUnit.zip",
                "S_USA.NFSLandUnit.zip")
unzip("S_USA.NFSLandUnit.zip")
}

ANF_Boundary.sf <- st_read(dsn = "S_USA.NFSLandUnit.shp") %>%
  filter(NFSLANDU_2 == "Apalachicola National Forest")
```

```{r}
Fires.sf <- st_read(dsn = "Data/updatedFireData/Data/FPA_FOD_20210617.gpkg", layer = "Fires") %>%
  filter(STATE == "FL") %>%
  st_transform(crs = st_crs(ANF_Boundary.sf)) %>%
  st_intersection(ANF_Boundary.sf)
```