Dan BastonISciences, LLC 
Joshua BrinksISciences, LLC 

Summary

  • Zonal extraxtion processing is most commonly thought of using continuous raster data.
  • Several insightful analysis can include categorical raster data, or using continous and categorical raster layers together.
  • This vignette builds upon the exacextractr coastal populations tutorial by incoporating land cover analysis.

Introduction

This vignette is the second in a two-part series demonstrating the use of the extactextractr package. In the first vignette, we demonstrated several considerations for performing zonal extractions on meandering coastal regions such as São Miguel. In this vignette, we illustrate the the use of exactextractr with categorical land cover data.

We will use the following packages for this vignette.

library(exactextractr)
library(dplyr)
library(sf)
library(raster)

Loading the sample data

First, we load the CORINE land cover data and the concelho boundaries used in the first vignette. A sample of the CORINE 2018 raster dataset is included in exactextractr.

clc <- raster::raster(system.file('sao_miguel/clc2018_v2020_20u1.tif',
package = 'exactextractr'))
concelhos <- sf::st_read(system.file('sao_miguel/concelhos.gpkg',
package = 'exactextractr'),
quiet = TRUE)

The land cover class descriptions are provided in a separate DBF file. We read this in to a data frame, then use levels() to associate the class descriptions with the raster.

clc_classes <- foreign::read.dbf(system.file('sao_miguel/clc2018_v2020_20u1.tif.vat.dbf',
package = 'exactextractr'),
as.is = TRUE) %>%
dplyr::select(value = Value,
landcov = LABEL3)

levels(clc) <- list(data.frame(ID = clc_classes$value,
landcov = clc_classes$landcov))

This association provides us with a way to look up the description for a given ID. Alternatively, we can relate the values using merge or one of dplyr’s joins.

raster::factorValues(clc, c(2, 18, 24))
landcov
Discontinuous urban fabric
Pastures
Coniferous forest

Summarizing Land Cover Classifications

One of the first questions we might have is which land cover classification is predominant in each concelho. We can do this with exactextractr’s mode summary operation. The minority and variety operations are also applicable to categorical data and provide the least-common classification and number of distinct classifications, respectively.

landcov_mode <- exactextractr::exact_extract(clc, concelhos, 'mode', 
append_cols = 'name', progress = FALSE) %>%
dplyr::inner_join(clc_classes, by=c(mode = 'value'))
name landcov
Lagoa Land principally occupied by agriculture, with significant areas of natural vegetation
Nordeste Coniferous forest
Ponta Delgada Pastures
Povoação Broad-leaved forest
Ribeira Grande Land principally occupied by agriculture, with significant areas of natural vegetation
Vila Franca do Campo Land principally occupied by agriculture, with significant areas of natural vegetation

Summary Functions

While mode provides an easy way to see the most common land cover category, we need to write a custom summary function to see the frequency of different land cover types in an area.

Summary functions are called once per feature from the input sf object. They can return either:

  • A scalar, in which case the return value of exact_extract will be a vector whose entries correspond with the rows of the input sf object, or

  • A data frame, in which case exact_extract will return a row-wise combination of the data frames for each feature. If the data frame returned by the summary function will have than a single row, it is useful for some identifying information to be included in the returned data frame.

If we are going to perform typical data frame operations on the raster values and coverage fractions, it can be more convenient for the summary function to accept a single data frame argument instead of separate arguments for the cell values and coverage fractions. This behavior can be enabled with the summarize_df argument.

Using this method, we can calculate the fraction of each concelho that is covered by each land cover category:

landcov_fracs <- exactextractr::exact_extract(clc, concelhos, function(df) {
df %>%
dplyr::mutate(frac_total = coverage_fraction / sum(coverage_fraction)) %>%
dplyr::group_by(name, value) %>%
dplyr::summarize(freq = sum(frac_total))
}, summarize_df = TRUE, include_cols = 'name', progress = FALSE)

Here we use the include_cols argument to include the name column from concelhos in the data frame passed to the summary function. Although the value of name will be the same for all rows in df, we include name in the group_by expression so that it is not removed by summarize. Other available arguments include include_xy to get the cell center coordinates, include_area to get the cell area, and include_cell to get the cell index used by the raster package.

This creates a data frame with the frequency of land cover type in each concelho:

head(landcov_fracs)
name value freq
Lagoa 2 0.0629700
Lagoa 3 0.0135845
Lagoa 7 0.0065947
Lagoa 9 0.0063711
Lagoa 12 0.1697042
Lagoa 18 0.1543908

You can select the top 3 cover types in each region, and join the data frame with clc_classes to create a descriptive summary table for each concelho using kableExtra:

fracs_summary <-
landcov_fracs %>%
dplyr::inner_join(clc_classes, by = 'value') %>%
dplyr::group_by(name) %>%
dplyr::arrange(dplyr::desc(freq)) %>%
dplyr::slice_head(n = 3) %>%
dplyr::mutate(freq = sprintf('%0.1f%%', 100 * freq))

kableExtra::kbl(fracs_summary[, -1],
col.names = c("Cover", "Frequency", "Cover Description"),
align = c('c',"c", "l"),
full_width = F,
position = "center") %>%
kableExtra::pack_rows(index = table(fracs_summary$name))
Cover Frequency Cover Description
Lagoa
21 26.8% Land principally occupied by agriculture, with significant areas of natural vegetation
12 17.0% Non-irrigated arable land
20 15.6% Complex cultivation patterns
Nordeste
24 24.3% Coniferous forest
18 22.5% Pastures
21 14.2% Land principally occupied by agriculture, with significant areas of natural vegetation
Ponta Delgada
18 29.9% Pastures
21 26.5% Land principally occupied by agriculture, with significant areas of natural vegetation
12 11.3% Non-irrigated arable land
Povoação
23 24.2% Broad-leaved forest
18 20.1% Pastures
21 16.7% Land principally occupied by agriculture, with significant areas of natural vegetation
Ribeira Grande
21 25.1% Land principally occupied by agriculture, with significant areas of natural vegetation
18 22.7% Pastures
12 16.4% Non-irrigated arable land
Vila Franca do Campo
21 36.0% Land principally occupied by agriculture, with significant areas of natural vegetation
18 15.5% Pastures
27 15.0% Moors and heathland

Similarly, we can find the top land covers by area:

landcov_areas <- exactextractr::exact_extract(clc, concelhos, function(df) {
df %>%
dplyr::group_by(name, value) %>%
dplyr::summarize(area_km2 = round(sum(coverage_area) / 1e6))
}, summarize_df = TRUE, coverage_area = TRUE, include_cols = 'name', progress = FALSE)
Area (km2) Cover Description
Lagoa
12 Land principally occupied by agriculture, with significant areas of natural vegetation
8 Non-irrigated arable land
7 Pastures
Nordeste
25 Coniferous forest
23 Pastures
14 Land principally occupied by agriculture, with significant areas of natural vegetation
Ponta Delgada
70 Pastures
62 Land principally occupied by agriculture, with significant areas of natural vegetation
26 Non-irrigated arable land
Povoação
26 Broad-leaved forest
21 Pastures
18 Land principally occupied by agriculture, with significant areas of natural vegetation
Ribeira Grande
45 Land principally occupied by agriculture, with significant areas of natural vegetation
41 Pastures
30 Non-irrigated arable land
Vila Franca do Campo
28 Land principally occupied by agriculture, with significant areas of natural vegetation
12 Pastures
12 Moors and heathland

Summarizing Population Land Cover

Taking this a step further, one may be interested in the correlation between population and land cover class in a given concelho; i.e. is the population primarily urban or rural? As described in the previous vignette, using the population density raster is more accurate when pixels are partially covered by the boundary layer.

pop_density <- raster::raster(system.file('sao_miguel/gpw_v411_2020_density_2020.tif',
package = 'exactextractr'))

This analysis is only possible because the sample CORINE dataset packaged with exactextractr has been reprojected from its native Lambert Equal Area projection into geographic coordinates used for the Gridded Population of the World; the projections must match. If you are working with raster data in different projections, you must transform them to a common projection using a tool like raster::projectRaster, the gdalwarp command-line utility, or a projection tool bundled with a standalone GIS product like QGIS.

We can use a similar call to exact_extract to incorporate the population data into our analysis. We set weights = pop_density and, since we are using the summarize_df option, a column called weight will be added to the data frame passed to the summary function. We also set coverage_area = TRUE so that we can multiply the density by the covered pixel area to get a population count.

landcov_pop_areas <- exactextractr::exact_extract(clc, concelhos, function(df) {
df %>%
dplyr::group_by(name, value) %>%
dplyr::summarize(pop = sum(coverage_area * weight / 1e6)) %>%
dplyr::mutate(pop_frac = pop / sum(pop))
}, weights = pop_density, coverage_area = TRUE,
summarize_df = TRUE, include_cols = 'name')

Looking at the highest-population land cover type in each concelho, we can can see that the western/central concelhos of Lagoa, Ponta Delgada, Ribeira Grande, and Vila Franca do Campo have a more urban population than Nordeste or Povoação to the east.

Concelho Cover Description Population Fraction
Lagoa Discontinuous urban fabric 6548 0.415
Nordeste Complex cultivation patterns 1280 0.282
Ponta Delgada Discontinuous urban fabric 27224 0.381
Povoação Complex cultivation patterns 1568 0.261
Ribeira Grande Discontinuous urban fabric 13497 0.374
Vila Franca do Campo Discontinuous urban fabric 4442 0.377

Add new comment

Plain text

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>