Summary
- The untools package provides functions to easily acquire and visualize UN refugee and asylum seeker time series data.
- You can communicate directly with the UNHCR API for up-to-date data.
- untools also provides convenience functions for fixing country name typos, removing stateless entries, and introducing common country codes.
Introduction
Note: This vignette was updated August, 2020 to reflect changes to the UNHCR Data API, available data, and subsequent streamlining of the untools
package.
The Office of the United Nations High Commissioner for Refugees (UNHCR) provides several data sets describing annual movements of populations of concern. These include asylum seekers, asylum application data, asylum application decisions, refugees, internally displaced persons (UNHCR and IDMC tracked), returned refugees, returned internally displaced persons, stateless persons, Palestinian refugees, Venezuelans displaced abroad, and other populations of concern. The UNHCR Refugee Data Finder web portal serves as a central hub for several data sets summarizing the aforementioned populations by year, month, gender, age, origin, and destination. The web portal is a fine exploratory tool, but can be cumbersome for research purposes. Today we will be summarizing and exploring the UNHCR Population data for the most common UNHCR dataset that tracks refugees and asylum seekers. This dataset consists of annual dyadic flows for all populations of concern between countries of origin (citizenship) and countries of destination (asylum/residency). Although the earliest years of record are 1951, exhibit caution when performing analysis and causal inference for years prior to 1990.
Getting Started
Installing untools
The Populations of Concern dataset can be acquired directly using the getUNref()
function from the untools
package. You can install the current release of untools
from GitLab with the devtools
package.
::install_gitlab("/dante-sttr/untools", dependencies=TRUE) devtools
Acquiring the Data
Load both untools
and data.table
for some light data wrangling. Then use getUNref()
to download the most recent data from the UNHCR API.
library('data.table')
library('untools')
<-untools::getUNref() unhcr.ts
In addition to the getUNref()
function we demonstrate in this vignette, the dataset is available for download directly from the UNHCR Refugee Data Finder web portal. The Time Series dataset is one of the cleanest data sets provided by the UNHCR, however, it might be challenging for a beginning programmer. The untools
package is designed to provide simplified tools for data acquisition, processing, and visualization for popular United Nations data sets.
year | coo_id | coo_name | coo | coo_iso | coa_id | coa_name | coa | coa_iso | refugees | asylum_seekers | returned_refugees | idps | returned_idps | stateless | ooc | vda |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1951 | 262 | Unknown | UKN | NULL | 11 | Australia | AUL | AUS | 180000 | 0 | 0 | 0 | 0 | 0 | 0 | NA |
1951 | 262 | Unknown | UKN | NULL | 12 | Austria | AUS | AUT | 282000 | 0 | 0 | 0 | 0 | 0 | 0 | NA |
1951 | 262 | Unknown | UKN | NULL | 17 | Belgium | BEL | BEL | 55000 | 0 | 0 | 0 | 0 | 0 | 0 | NA |
1951 | 262 | Unknown | UKN | NULL | 33 | Canada | CAN | CAN | 168511 | 0 | 0 | 0 | 0 | 0 | 0 | NA |
1951 | 262 | Unknown | UKN | NULL | 50 | Denmark | DEN | DNK | 2000 | 0 | 0 | 0 | 0 | 0 | 0 | NA |
This is a fairly simple dataset, consisting of the year of observation (year
), country of origin (coo, coo_name
), destination/asylum country (coa, coa_name
), and fields for the different population types (refugees
, asylum_seekers
, returned_refugees
, idps
, returned_idps
, stateless
, ooc
, and vda
). Previous iterations of this dataset included numerous special characters, unusual formatting, and other issues not conducive to programmatic research and analysis, however, the July 2020 revision of the UNHCR datasets has largely removed these concerns. Please refer to the getUNref
helpfile for more information detailing the different populations of concern and additional fields not addressed in this vignette.
names(unhcr.ts)
#> [1] "year" "coo_id" "coo_name" "coo" "coo_iso"
#> [6] "coa_id" "coa_name" "coa" "coa_iso" "refugees"
#> [11] "asylum_seekers" "returned_refugees" "idps" "returned_idps" "stateless"
#> [16] "ooc" "vda"
untools Functions for UNHCR Data
The prepUNref Function
The untools
package provides several functions for processing and visualizing UNHCR data. The prepUNref()
function will help process raw UNHCR time series data by converting to wide or long form, selecting years of specific interest, selecting populations of interest, and summing across groups. Using prepUNref()
with no additional parameters will subset the data for only refugee
and asylum_seekers
, and convert the data from wide to long form that is more conducive to visualization and analysis.
<-prepUNref(unhcr.ts) unhcr.ts.dante
year | coo_id | coo_name | coo | coo_iso | coa_id | coa_name | coa | coa_iso | type | persons |
---|---|---|---|---|---|---|---|---|---|---|
1951 | 262 | Unknown | UKN | NULL | 11 | Australia | AUL | AUS | refugees | 180000 |
1951 | 262 | Unknown | UKN | NULL | 12 | Austria | AUS | AUT | refugees | 282000 |
1951 | 262 | Unknown | UKN | NULL | 17 | Belgium | BEL | BEL | refugees | 55000 |
1951 | 262 | Unknown | UKN | NULL | 33 | Canada | CAN | CAN | refugees | 168511 |
1951 | 262 | Unknown | UKN | NULL | 50 | Denmark | DEN | DNK | refugees | 2000 |
There are several records with Unknown
as the country of origin. While these are not trivial, for this exploration we will focus on known dyadic flows between countries.
<-unhcr.ts.dante[!(coo=='UKN')] unhcr.ts.dante
year | coo_id | coo_name | coo | coo_iso | coa_id | coa_name | coa | coa_iso | type | persons |
---|---|---|---|---|---|---|---|---|---|---|
1960 | 6 | Angola | ANG | AGO | 41 | Dem. Rep. of the Congo | COD | COD | refugees | 150000 |
1961 | 161 | Rwanda | RWA | RWA | 16 | Burundi | BDI | BDI | refugees | 30000 |
1961 | 6 | Angola | ANG | AGO | 41 | Dem. Rep. of the Congo | COD | COD | refugees | 150000 |
1961 | 161 | Rwanda | RWA | RWA | 41 | Dem. Rep. of the Congo | COD | COD | refugees | 53000 |
1961 | 161 | Rwanda | RWA | RWA | 186 | United Rep. of Tanzania | TAN | TZA | refugees | 12000 |
By default, prepUNref()
selects all years and all affected populations, but the user can specify populations and years of interest by using the groups
and range
options. For example, specifying groups = c('refugees')
and range = c(2000,2017)
will only process refugees between 2000 and 2017.
<-prepUNref(unhcr.ts, groups = c('refugees'), range = c(2000, 2017)) unhcr.ts.dante
Lastly, prepUNref()
provides 2 additional logical switches; wide
and sum
. By default, prepUNref()
returns long data frames. This is most convenient for plotting and modeling, however, sometimes it’s interesting to explore data in wide form; especially time series data sets. Moreover, the sum_groups
option will aggregate the totals across all specified groups. Lets use these 2 switches to look at the sum of Syrian refugee and asylum seeking out-flows to Germany between 2014-2017 using wide = TRUE
.
<-prepUNref(unhcr.ts, groups = c('refugees', 'asylum_seekers'), range = c(2014, 2017), sum_groups=TRUE, wide=TRUE) unhcr.ts.dante
coo_id | coo_name | coo | coo_iso | coa_id | coa_name | coa | coa_iso | 2014 | 2015 | 2016 | 2017 |
---|---|---|---|---|---|---|---|---|---|---|---|
185 | Syrian Arab Rep. | SYR | SYR | 72 | Germany | GFR | DEU | 70585 | 197186 | 475649 | 567507 |
Static Grouped Flows
With more than 100,000 unique country-country-year records, outflows, inflows, and varying populations of interest, visualizing the UNHCR can be overwhelming. The untools
packages provides multiple default plotting functions objects produced by the prepUNref()
function. An easy launching point to investigate flows between countries are static barplots of dyadic flows in or out of a target country. Using plot()
on an object produced by prepUNref()
with sum_groups = TRUE
will produce a barplot for the target country and the top 8 destination or origin countries. The user specifies the country of interest, a year of interest, and whether they want to view inflows (mode = 'in'
) or outflows (mode = 'out'
). Let’s start by viewing asylum seeking inflows to the United States in 2013.
<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees'), sum_groups = TRUE, range = c(2012, 2017)) unhcr.ts.dante
<-plot(unhcr.ts.dante, country = 'USA', mode = 'in', yr = c(2013, 2013)) usa.in
Somewhat surprisingly, China tops the list, while Central America rounds out the rest of the top 5. By default, plot()
will list up to 8 countries and will use the maximum year in the prepared dataset if no other year(s) are specified. We can view asylum seeking outflows from the Philippines in 2019 with a simple call.
<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers'), range = c(2012, 2019)) unhcr.ts.dante
<-plot(unhcr.ts.dante, country = 'PHL', mode = 'out') phl.out
Stacked Static Flows by Population Type
Up until this point we’ve visualized cumulative migrant flows across all groups or a singular group, but it may be of interest to examine relative proportions of asylum seekers, refugees, and stateless persons. The untools
package provides default plotting functions to visualize stacked bar charts of migrant inflows or outflows by groups. Let’s re-examine inflows of migrants to the USA in 2017, but this time include breakdowns by type. To maintain effected population breakdowns specify sum_groups = FALSE
.
<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees','stateless'), range = c(2000, 2017), sum_groups = FALSE) unhcr.stacked
<-plot(unhcr.stacked, country = 'USA', mode = 'in') usa.stacked.in
Plotting Time Series
Although static plots of migrant flows are interesting, it’s often more illuminating to examing time series data for migrant inflows and outflows. The untools
package also provides default plotting functions to visualize time series migrant flows for data frames produced with prepUNref()
using sum_groups = TRUE
. The default plotting function will produce a plot for all years present in the raw data using the 5 countries with the highest totals in the maximum year of the dataset. Let’s view annual cumulative refugee and asylum seeking in-flows to the USA from 2000-2017.
<-prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees'), range = c(2000, 2017), sum_groups = TRUE) unhcr.ts.dante
<-plot(unhcr.ts.dante, country = 'USA', mode = 'in') usa.ts.in
Lastly, similar to the static default plotting functions, we can specify mode = 'out'
to view outflows from a given country.
<-plot(unhcr.ts.dante, country = 'PHL', mode = 'out') phl.ts.out
Add new comment