Joshua BrinksISciences, LLC 
Photo by Susn Dybvik on Pexels

Summary

  • Dyadic data is commonplace in political science and geography, but there are limited options for visualization.
  • Although chord diagrams are a great option for visualization, they can be more complicated to create.
  • We walk through creating a chord diagram with R’s circlize package.

Introduction

Dyadic data can be found in every corner of the environment-security research sector. This is no surprise, because it’s often the case in the eco-security sector that we’re simply analyzing interactions between countries or other municipalities. Whether you’re working with IMF’s Direction of Trade,1 UCDP Conflict,2 Militarized Interstate Disputes,3 ICEWS4 and GDELT5 event data, UN Migrant Stocks,6 UNHCR Populations of Concern, U.S. Agricultural Trade,7 or countless other datasets spanning the geographical, political, and environmental sciences you might be looking at dyadic data.

Despite the prevalence of dyadic data, there are limited options for compelling visualizations. Tables are a natural fit, but they’re not fun, and your eyes will start to glaze over as the number dyadic pairs increases. Barplots and line graphs can be used to highlight specific pairs or subsets of dyadic pairs, but you need something a little more to visualize a multitude of interactions. Two of the most popular options are Sankey and chord diagrams.

Sankey diagrams visualize flows between entities with the ability to depict flows through time. These can be created with R using the networkD3 and plotly packages. Chord diagrams use a circular structure to visualize dyadic flows, however, they do not handle time series data unless you opt to animate the plot. The most popular package to create chord plots in R is circlize.8

Sankey diagram depicting dispersal of funds from the CARES Act.

Sankey diagram depicting dispersal of funds from the CARES Act.

circlize is a well developed package originally designed for bioinformatics. Chord diagrams are only a small part of the package’s overall capabilities, and there is an accompanying bookdown reference manual detailing all of its potential. Although there is a plethora of documentation for using circlize, it can be cumbersome to create figures just as you imagine them. This is because circlize relies on Base R Graphics (screams)! If, like myself, your visualization workflows have become engulfed by the ggplot2 universe over the past decade, it can be real tedious jumping back into Base R plotting. Chapters 14 and 15 of the circlize manual review the chord diagram functionality, but I found that it still took a bit of internet searching and trial and error to get what I initially envisioned.

My intent with this guide is to walk through each step of creating a chord diagram using circlize.

Chord diagram made by circlize. Courtesy of www.r-graph-gallery.com

Chord diagram made by circlize. Courtesy of www.r-graph-gallery.com

The Data

For this example I chose the United Nations High Commissioner on Refugees Populations of Concern dataset. I wanted to focus on visualizing, not processing, the data. In comparison to some other dyadic datasets, UNHCR refugee and asylum data does not require as much pre-processing.

Acquiring the Data

You can download the data from the UNHCR Refugee Data Finder or using DANTE’s untools package. For this example, I’m going to access the data programmatically with untools, and perform some light processing using data.table.

You can install the released version of untools from GitLab with:

devtools::install_gitlab("dante-sttr/untools", dependencies=TRUE)

You can download and load the refugee data into your global environment with the following call.

unref<-untools::getUNref()

Subsetting

This is a large dataset with several fields. UNHCR does not provide a lot of documentation, but you can review the variable descriptions in the untools::getUNref() help documentation. For simplicity, we will only visualize refugees so we can start off by subsetting the data for the refugee totals, the year of observation, and some dyad identifiers. The sender country is coo_name (country of origin) and the receiving country is coa_name (country of asylum). The _iso suffixes refer to the respective ISO3 character codes; these help us join the data to any additional information we might want to include from external datasets.

library(data.table)
## data.table 1.14.0 using 2 threads (see ?getDTthreads). Latest news: r-datatable.com
keeps<-c("coo_name", "coo_iso", "coa_name", "coa_iso", "year", "refugees")
unref<-unref[ ,..keeps]

The time series also goes back to 1950. We’ll make it more manageable (and trustworthy) by restricting it to 2000-2019 (last year of available data).

unref<-unref[year %in% seq(2000, 2019)]

This still includes more than 84,000 records, but more importantly it contains 7,761 unique sender and receiver country combinations. This is not tenable for a legible figure. We can identify the most active countries by first removing internally displaced refugees (coa_iso=coo_iso) and creating a summary table to extract the top country codes.

unref<-unref[coo_iso!=coa_iso]
coo.summary<-unref[, .(sent=sum(refugees)), by=coo_iso]
coa.summary<-unref[, .(received=sum(refugees)), by=coa_iso]
unref.summary<-merge(coa.summary, coo.summary,
 by.x="coa_iso", by.y ="coo_iso")
unref.summary[, total:=sent+received]

With most summaries I would extract the top 20-30 countries by cumulative flows, but there are some pretty skewed refugee counts due to massive outflows. To get a better sense of senders and receivers, we’ll generate a unique list of countries comprising the top 15 senders and the top 15 receiver countries.

top.senders<-unref.summary[order(-sent), coa_iso][1:15]
top.receivers<-unref.summary[order(-received), coa_iso][1:15]
top.countries<-unique(c(top.senders,top.receivers))
unref<-unref[coo_iso %in% top.countries & coa_iso %in% top.countries]

Now we remove observations from “Unknown” origin, erroneous observations with 0 refugee counts, and sum the annual totals for each unique dyad.

unref<-unref[refugees>0 & coo_name!="Unknown "] #it has a whitespace
unref<-unref[, .(refugees=sum(refugees)), by=.(coo_iso,coa_iso)]

If you feed circlize a data.frame instead of a matrix, you need to rename the columns c("from", "to", "value").

names(unref)<-c("from", "to", "value")

Secondary Grouping

At this point the data can be received by circlize, however, when possible, I like to include an outer ring to the chord diagram with a higher level grouping variable. In this instance, the global region of the country. You can quickly acquire global regions from packages like raster::ccodes(), countrycode::countrycode(), and WDI::WDI_data. raster and countrycode provide United Nations regional designations. They are a good choice for many applications, however, WDI includes the Middle East and North Africa (MENA) region, which is an important distinction in environment-security analysis so we’ll proceed with the WDI global regions.

We’ll pull in the embedded country code data.frame and merge them with the refugee data.

library(WDI)

regions<-WDI::WDI_data[["country"]]
regions<-data.table::as.data.table(regions)

Thankfully, the WDI data already has ISO3 country codes so we don’t have to perform further processing in order to merge with the refugee data. Secondary grouping designations only have to be merged to the “from” (country of origin) to create the desired effect.

unref<-merge(unref,
 regions[, .(iso3c,region)],
 by.x = "from",
 by.y = "iso3c",
 all.x = TRUE)

Sometimes you need to play around with the region names (shortening them), or placing a country into another region to make the visualization work best. We’ll check the breakdown for for observations in each region.

table(unref$region)
## 
## East Asia & Pacific Europe & Central Asia Latin America & Caribbean Middle East & North Africa 
## 11 14 2 60 
## North America South Asia Sub-Saharan Africa 
## 2 26 165
table(unref$from)
## 
## AFG BDI CAF CHN COD COL DEU ERI ETH IRN IRQ JOR KEN LBN MMR PAK SDN SOM SSD SYR TCD TUR TZA UGA USA VNM 
## 15 16 9 5 17 2 1 17 18 13 16 7 7 6 3 11 17 20 15 18 10 13 7 12 2 3

We might have to merge Latin America, Caribbean, and North America into “The Americas”, but we’ll wait and see how it looks first.

Building the Chord Plot

Chord diagrams are built with the circlize::chordDiagram() function. circlize has become more user friendly with recent updates. Prior versions required the input data be in adjacency matrices. If you’re a software engineer this may sound great, but most modern researchers prefer the data.frame. Thankfully, circlize now accepts edge lists in a data.frame where the first 3 columns represent from, to, and value (we’ve already set this up). That said, to set up custom palettes, secondary grouping variables, and country highlighting you need to feed circlize::chordDiagram() named character vectors.

We’ll start small and build up the desired plot. Every circlize plot starts with circlize::circos.clear().

library(circlize)
circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref
)

This is quick, informative, and provides a better sense of the dataset, but leaves a lot to be desired for publications or a stakeholder meeting. We’ll start with setting the order. Each sector (countries in this example) is plotted in order of appearance in our edge list starting at “3 O’Clock”.

Region Groupings

For this example we’ll sort by region so the sectors are lined up for the outer ring with the global regions. You could take this a step further by specifying a desired factor order and then sorting, but we’ll stick with a simple alphabetized region order.

unref<-unref[order(region)]

Now we can create a named character vector to specify the global regions. The region info is in our edge list, but circlize::chordDiagram() needs it separately. The easiest way to create named character vectors is with structure().

# unique data.table of countries and regions
group.labs<-unique(unref[,.(from,region)])
# collapse it into a named vector
group.labs<-structure(as.character(group.labs$region),
 names = as.character(group.labs$from))

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs
)

The figure is now properly sorted by region. circlize automatically introduces a gap between the secondary groupings. We’ll play with the spacing of the gap later, but it’s clear that we should adjust the groupings a bit and possibly remove a few entries. Myanmar (MMR) and Columbia (COL) have very small refugee flows despite being in the top 15 of refugees sent; it’s reasonable to remove them. We also might want to rename or aggregate East Asia and North America, but we’ll wait until we introduce the outer ring with the labels.

Drop MMR and COL, and re-calculate the named region vector.

unref<-unref[!(from %in% c("MMR", "COL")) & !(to %in% c("MMR", "COL"))]
 
# unique data.table of countries and regions
group.labs<-unique(unref[,.(from,region)])
# collapse it into a named vector
group.labs<-structure(as.character(group.labs$region),
 names = as.character(group.labs$from))

This is also a good time to set up a palette. There are 2 options:

  1. A different color for each sector (country).

  2. Regional/grouping based colors applied to every country within each region.

With a more manageable number of sectors (< 15) I would probably generate a country specific palette, but with a large number of sectors it can be distracting. For this example, we’ll establish a region based palette. You can do this with a palette and a clever one liner that uses rep() across the length of the frequency for each region, but I will utilize the data.frame of countries and regions, merge with the palette, and then collapse it into a named vector.

palette <- c("#f1bd57", # yellow
 "#23dbe1", # teal
 "#fc7f20", # orange
 "#b66fbb", # purple
 "#68ed99", # green
 "#0b3d91") # dark blue

# get the regions
region.labs<-data.frame(country=names(group.labs), 
 region=group.labs, 
 row.names=NULL)
# region colors
country.cols<-data.frame(region = unique(group.labs),
 colors = palette)
# merge with the country codes
country.cols<-merge(region.labs, country.cols, by = "region", all.x = TRUE)
country.cols<-structure(country.cols$colors,
 names = country.cols$country)

Basic Customizations

Next, we’ll make a few more aesthetical improvements using some additional arguments and check the plot again.

  • grid.col sets the sector/country colors we just established in the previous section.

  • link.zindex sorts the order of how flows are drawn. I usually set the largest flows to be last so they are “on top” and easier to read. You can do whatever order makes sense to you. Setting the argument to rank(unref$value) will place the largest flows on top.

  • small.gap determines the spacing between countries. I bump this up a bit when some of the country flows are small so the labels aren’t crammed together.

  • big.gap determines the spacing between regional groupings. This should be altered for visual effect and to create more spacing (without looking ridiculous) for regions with longer names.

  • grid.border sets the grid border color. Set to NULL to make the border color the same as the grid colors. I find this to be a more modern look.

  • transparency controls the alpha level of the flows. I usually introduce a moderate alpha level, but most of the refugee flows are within, not between, regions so the flows do not overlap.

  • directional this sets the nature of the relationship. In this case we have directionally specific dyad data; i.e. refugees from country a, to country b, is distinct from refugees from country b, to country a. Specify 1 when the first column is the source (like ours), -1 when the second column is the source, and 0 if the data lacks directionality (such as cumulative trade).

  • direction.type allows you to create a differential starting/end point for incoming vs. outgoing refugee flows, and transform the flows into subtle arrows. For a global setting you can specify c("diffHeight", "arrows"). To control each individual country create a named vector for the input.

  • link.arr.type sets the the terminal flow to either a "triangle" (default) or "big.arrow".

Let’s view some of these changes.

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs,
 # named vector of country/sector colors
 grid.col = country.cols,
 # plot largest flows last
 link.zindex = rank(unref$value),
 # link alpha
 transparency = 0.05,
 # space between countries/sectors
 small.gap = 2,
 # space between regions/groups
 big.gap = 10,
 # remove border
 grid.border = NULL,
 # assign direction: from > to
 directional = 1,
 # set nature of outgoing link style
 direction.type = c("diffHeight", "arrows"),
 # set link end to a large arrow
 link.arr.type = "big.arrow"
)

This is starting to look pretty good. You’ll notice that by specifying directionally specific flows, we’ve inadvertently created a proportional outflow track below each of the outflow arrows within each sector. This track indicates the proportion of the outflows for each country that went to the sector/region of the color shown. For example, the teal block underneath of Syria’s (SYR) outflows shows that more than half of their refugees went to either Turkey or Germany. You may find this illuminating; I normally remove this strip by calling the argument link.target.prop = FALSE.

Customizing the Sector Track

This is a sufficient figure, but I usually feel that the axis ticks clutter the figure and do not convey a lot of meaningful information. I prefer to use chord diagrams for the raw visual experience of seeing the relative flows, and less so to derive estimates for the flows themselves. This is compounded by the range of the values you’re plotting. In this case, the refugee flows have some extreme outliers (SYR, AFG) that induce scientific notation so we would have to transform the scale.

We can remove the axis ticks by disaggregating and pre-allocating the 2 tracks we want to plot (countries and regions). This allows greater customization, but is more complicated and might require tedious adjustments to the label positioning and sizes until the desired effect is achieved.

Each circlize “track” is composed of 3 sub-tracks:

  1. The “grid”: main color/block for each country/sector

  2. The “name”: sub-track surrounding the “grid” with the sector/country labels

  3. The “axis”: the tick marks for flow values.

To enable customized tracks we indicate in the circlize::chordDiagram() call which sub-track(s) we want to create and then pre-allocate the tracks for the countries and regions.

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs,
 # named vector of country/sector colors
 grid.col = country.cols,
 # plot largest flows last
 link.zindex = rank(unref$value),
 # link alpha
 transparency = 0.05,
 # space between countries/sectors
 small.gap = 2,
 # space between regions/groups
 big.gap = 10,
 # remove border
 grid.border = NULL,
 # assign direction: from > to
 directional = 1,
 # set nature of outgoing link style
 direction.type = c("diffHeight", "arrows"),
 # set link end to a large arrow
 link.arr.type = "big.arrow",
 # remove the proportional bar under sender outgoing indicating who's receiving
 link.target.prop = FALSE,
 # each track can have sub-tracks for the "grid" (main color/block for
 # country), the "name" (another sub track just outside grid with
 # sector/country names), and the "axis" (tick marks). I suppress tick
 # marks and names - we make custom sector/country labels in next step
 annotationTrack = "grid",
 # lastly, pre allocate the tracks and their heights. First element is track 1
 # (the regions), and track 2 are the countries
 preAllocateTracks = list(list(track.height = 0.075),
 list(track.height = 0.01))
)

We added 3 new arguments to the bottom of the circlize::chordDiagram() call.

  • link.target.prop = FALSE from the previous section to remove the proportional outflow track.

  • annotationTrack = "grid" to only draw the main blocks for each country with no axis or labels.

  • preAllocateTracks creates the 2 tracks we want; one for the countries, and one for the region labels. The regional track is invisible because we are not plotting any “values” for the region, only individual countries. This argument accepts different input classes. In this example I specified a list where each element is another list of track options. I only indicated the heights of the country and region tracks. The elements are in order from outer to inner track. This can be confusing; ultimately you play with the heights to get the balance in size you’re looking for.

When you opt to customize the tracks by disaggregating the tracks, you need additional function calls after circlize::chordDiagram() to create the sector and region labels.

Building the Custom Country Track

The country sector labels are generated with the circlize::circos.track() function after the initial plot is created. Within the circlize::circos.track() function, you must call a panel function (panel.fun = function(x, y)) to place the countries. Then within the panel function you must call circlize::circos.text() to style the labels.

We’ll run the completed code chunk and then briefly walk through the options.

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs,
 # named vector of country/sector colors
 grid.col = country.cols,
 # plot largest flows last
 link.zindex = rank(unref$value),
 # link alpha
 transparency = 0.05,
 # space between countries/sectors
 small.gap = 2,
 # space between regions/groups
 big.gap = 10,
 # remove border
 grid.border = NULL,
 # assign direction: from > to
 directional = 1,
 # set nature of outgoing link style
 direction.type = c("diffHeight", "arrows"),
 # set link end to a large arrow
 link.arr.type = "big.arrow",
 # remove the proportional bar under sender outgoing indicating who's receiving
 link.target.prop = FALSE,
 # each track can have sub-tracks for the "grid" (main color/block for
 # country), the "name" (another sub track just outside grid with
 # sector/country names), and the "axis" (tick marks). I suppress tick
 # marks and names - we make custom sector/country labels in next step
 annotationTrack = "grid",
 # lastly, pre allocate the tracks and their heights. First element is track 1
 # (the regions), and track 2 are the countries
 preAllocateTracks = list(list(track.height = 0.075),
 list(track.height = 0.01))
)

# now set up country track names
# the final plot aesthetics are dependent upon the combination of text size,
# text positioning, link border thickness settings, and the graphical device
# output size. This all renders nicely output to pdf ~10x10in using the RStudio
# viewer pdf output
circlize::circos.track(
 # specify inner/sector/country track
 track.index = 2,
 panel.fun = function(x, y) {
 # get x location
 xlim = circlize::get.cell.meta.data("xlim")
 # get y location
 ylim = circlize::get.cell.meta.data("ylim")
 # pull sector info
 sector.index = circlize::get.cell.meta.data("sector.index")
 # set up the text
 circlize::circos.text(
 # x location for placement
 mean(xlim),
 # y location for placement
 mean(ylim),
 # text color
 col = "#323232",
 # font size (bold)
 font = 1,
 # text size
 cex = 0.8,
 # the actual text (sector name)
 sector.index,
 # text orientation
 facing = "outside",
 # beautify, only works with facing != NULL
 niceFacing = TRUE,
 # text position adjustment
 adj = c(0.5, 0.5)
 )
 },
 # remove background border
 bg.border = NA
)

  • circlize::circos.track() is the higher level call. Note that you do not call circos.clear() between circlize::chordDiagram() and circlize::circos.track()

    • track.index specifies which track we are setting up. In this case it’s 2 (country track).

    • panel.fun contains the plotting label locations. Many of the arguments inside the panel function use circlize::get.cell.meta.data() to harvest information from the original circlize::chordDiagram() call that builds the base plot.

      • xlim = circlize::get.cell.meta.data("xlim") extracts the range of x values for the country sector from the base plot.

      • ylim = circlize::get.cell.meta.data("ylim") extracts the range of y values for the country sector from the base plot.

      • sector.index = circlize::get.cell.meta.data("sector.index") extracts the sector information stored in the base plot.

      • circlize::circos.text() places the text and sets styling options.

        • The first argument is the x location. We use the mean(xlim) generated inside the xlim panel function above.

        • Second argument is the same but for ylim.

        • col is the text color.

        • font is the numeric value for font type. Here it is normal (1).

        • cex is the font size.

        • The next argument is the actual label. This is stored in sector.index captured in the panel function call above.

        • facing determines the text orientation style as you work around the circle. I always use "outside".

        • niceFacing makes the label pretty.

        • adj allows additional text position adjustments with a numeric vector of length 2. I find it’s best if you don’t have to mess with this, but as you’ll see when get to the end, you might be forced to adjust this.

As you can see, there are still some issues with label size and spacing, but it’s better to make adjustments after the outer regional track is plotted.

Building the Custom Region Track

Now we can create the regional labels. Because the regional track does not contain any flows or other values to plot, we can use circlize::highlight.sector() to plot them around the circle. We can accomplish this by looping through the unique regions and building the highlight track. Again, I’ll generate the plot and walk through the options.

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs,
 # named vector of country/sector colors
 grid.col = country.cols,
 # plot largest flows last
 link.zindex = rank(unref$value),
 # link alpha
 transparency = 0.05,
 # space between countries/sectors
 small.gap = 2,
 # space between regions/groups
 big.gap = 10,
 # remove border
 grid.border = NULL,
 # assign direction: from > to
 directional = 1,
 # set nature of outgoing link style
 direction.type = c("diffHeight", "arrows"),
 # set link end to a large arrow
 link.arr.type = "big.arrow",
 # remove the proportional bar under sender outgoing indicating who's receiving
 link.target.prop = FALSE,
 # each track can have sub-tracks for the "grid" (main color/block for
 # country), the "name" (another sub track just outside grid with
 # sector/country names), and the "axis" (tick marks). I suppress tick
 # marks and names - we make custom sector/country labels in next step
 annotationTrack = "grid",
 # lastly, pre allocate the tracks and their heights. First element is track 1
 # (the regions), and track 2 are the countries
 preAllocateTracks = list(list(track.height = 0.075),
 list(track.height = 0.01))
)

# now set up country track names
# the final plot aesthetics are dependent upon the combination of text size,
# text positioning, link border thickness settings, and the graphical device
# output size. This all renders nicely output to pdf ~10x10in using the RStudio
# viewer pdf output
circlize::circos.track(
 # specify inner/sector/country track
 track.index = 2,
 panel.fun = function(x, y) {
 # get x location
 xlim = circlize::get.cell.meta.data("xlim")
 # get y location
 ylim = circlize::get.cell.meta.data("ylim")
 # pull sector info
 sector.index = circlize::get.cell.meta.data("sector.index")
 # set up the text
 circlize::circos.text(
 # x location for placement
 mean(xlim),
 # y location for placement
 mean(ylim),
 # text color
 col = "#323232",
 # font size (bold)
 font = 1,
 # text size
 cex = 0.8,
 # the actual text (sector name)
 sector.index,
 # text orientation
 facing = "outside",
 # beautify, only works with facing != NULL
 niceFacing = TRUE,
 # text position adjustment
 adj = c(0.5, 0.5)
 )
 },
 # remove background border
 bg.border = NA
)

# Loop through the country regions to create the outer track loop through unique
# region values in the named vector sent to group in circlize::chordDiagram()
for(region in unique(group.labs)) {
 # extract the sectors/countries belonging to each group in the named vector
 countries = names(group.labs[group.labs == region])

 # setup the highlight in the group/region track
 circlize::highlight.sector(
 # the countries/sectors belonging to the current region
 countries,
 # specify the track we're working with (1=regions, 2=countries)
 track.index = 1,
 # color of the region track
 col = "#939798",
 # tect for the region track
 text = region,
 # size of the text
 cex = 1.2,
 # font of the text
 font = 1,
 # color of the text
 text.col = "white",
 # adjust vertical placement of the text
 text.vjust = 0.55,
 # style of font
 facing = "bending.outside",
 # make it pretty as you move around the circle
 niceFacing = TRUE
 )}

  • for(region in unique(group.labs)) we’re looping through the regions established in the group.labs character vector from above.

    • Within each region loop we grab the names of the countries belonging to that region with countries = names(group.labs[group.labs == region]).

    • circlize::highlight.sector() is called for each region we loop through.

      • The first argument (countries) specifies the countries we want to highlight in the current loop.

      • track.index tells circlize which rack we’re operating in (1 - region track).

      • col the hex value for the color we’re using to highlight.

      • text the text for the region label.

      • cex the region text size.

      • font the region text font.

      • text.col the region text color.

      • text.vjust vertical text placement adjustment. As before, best if you can avoid changing this.

      • facing the style for the text. Almost always "bending.outside".

      • niceFacing again, to make the text pretty.

Finalizing the Text

This worked and all the elements are in place, however, there are some text length and label sizing issues. This is the tedious part about working with circlize. Because it uses Base R you have to adjust the plotting device dimensions, the sector/region text size, sector/region gap spacing, and the text/length of the region labels themselves until you get something that looks nice in the context that you need the figure.

This looks terrible inside of a RMarkdown plotting window, but if you opened this up with RStudio’s Plots > Export > Save as PDF… selection and sized the plotting dimensions to 10 x 10 inches the text would look normal on export. Ideally, it’s easiest to play around with the export size in RStudio and then re-import the finished plot as a standalone vectorized image, but that’s not always an option; like with this tutorial. We’ll walk through getting this to look nice inside of this vignette with these steps:

  1. Address the USA sector. The USA flows are very low, and you could justify removing it from the plot. That said, USA is commonly a point of interest in environment-security analysis. It can’t logically be merged with other regions because there are no other North, Central, or South American countries in our subset. To keep USA, we’ll need to set the regional grouping to a whites pace. There’s no other real option.

  2. Vietnam (VNM) and China (CHN) need to be merged with South Asia into “Asia”.

  3. Rename Turkey and Germany to be in “Europe”.

  4. Rename “Middle East & North Africa” to “MENA”

  5. Increase the country spacing spacing with small.gap = 5 to give countries with smaller flows more room for their label.

  6. Set region label text size to cex = 0.8 and sector/country labels to cex = 0.6.

  7. Increase the regional background track height from 0.75 to 0.85 with preAllocateTracks = list(list(track.height = 0.085), list(track.height = 0.01))

First, we’ll make the regional name adjustments.

group.labs<-unique(unref[,.(from,region)])

group.labs[from=="USA", region:=""]
group.labs[from %in% c("VNM", "CHN"), region:="South Asia"]
group.labs[from %in% c("TUR", "DEU"), region:="Europe"]
group.labs[region=="Middle East & North Africa", region:="MENA"]

# collapse it into a named vector
group.labs<-structure(as.character(group.labs$region),
 names = as.character(group.labs$from))

Finally the label sizing and spacing.

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs,
 # named vector of country/sector colors
 grid.col = country.cols,
 # plot largest flows last
 link.zindex = rank(unref$value),
 # link alpha
 transparency = 0.05,
 # space between countries/sectors
 small.gap = 5,
 # space between regions/groups
 big.gap = 10,
 # remove border
 grid.border = NULL,
 # assign direction: from > to
 directional = 1,
 # set nature of outgoing link style
 direction.type = c("diffHeight", "arrows"),
 # set link end to a large arrow
 link.arr.type = "big.arrow",
 # remove the proportional bar under sender outgoing indicating who's receiving
 link.target.prop = FALSE,
 # each track can have sub-tracks for the "grid" (main color/block for
 # country), the "name" (another sub track just outside grid with
 # sector/country names), and the "axis" (tick marks). I suppress tick
 # marks and names - we make custom sector/country labels in next step
 annotationTrack = "grid",
 # lastly, pre allocate the tracks and their heights. First element is track 1
 # (the regions), and track 2 are the countries
 preAllocateTracks = list(list(track.height = 0.085),
 list(track.height = 0.01))
)

# now set up country track names
# the final plot aesthetics are dependent upon the combination of text size,
# text positioning, link border thickness settings, and the graphical device
# output size. This all renders nicely output to pdf ~10x10in using the RStudio
# viewer pdf output
circlize::circos.track(
 # specify inner/sector/country track
 track.index = 2,
 panel.fun = function(x, y) {
 # get x location
 xlim = circlize::get.cell.meta.data("xlim")
 # get y location
 ylim = circlize::get.cell.meta.data("ylim")
 # pull sector info
 sector.index = circlize::get.cell.meta.data("sector.index")
 # set up the text
 circlize::circos.text(
 # x location for placement
 mean(xlim),
 # y location for placement
 mean(ylim),
 # text color
 col = "#323232",
 # font size (bold)
 font = 1,
 # text size
 cex = 0.6,
 # the actual text (sector name)
 sector.index,
 # text orientation
 facing = "outside",
 # beautify, only works with facing != NULL
 niceFacing = TRUE,
 # text position adjustment
 adj = c(0.5, 0.5)
 )
 },
 # remove background border
 bg.border = NA
)

# Loop through the country regions to create the outer track loop through unique
# region values in the named vector sent to group in circlize::chordDiagram()
for(region in unique(group.labs)) {
 # extract the sectors/countries belonging to each group in the named vector
 countries = names(group.labs[group.labs == region])

 # setup the highlight in the group/region track
 circlize::highlight.sector(
 # the countries/sectors belonging to the current region
 countries,
 # specify the track we're working with (1=regions, 2=countries)
 track.index = 1,
 # color of the region track
 col = "#939798",
 # tect for the region track
 text = region,
 # size of the text
 cex = 0.8,
 # font of the text
 font = 1,
 # color of the text
 text.col = "white",
 # adjust vertical placement of the text
 text.vjust = 0.55,
 # style of font
 facing = "bending.outside",
 # make it pretty as you move around the circle
 niceFacing = TRUE
 )}

This is much better, but not perfect. It’s difficult to create a perfect example within a code chunk figure in an RMarkdown document.

If this was for a professional document you might consider additional adjustments to:

  • preAllocateTracks heights.

  • circlize::circos.text(adj = c(0.5, 0.5)) argument.

  • circlize::highlight.sector(text.vjust = 0.55) argument.

  • Re-sizing the plotting window and exporting the graphic.

The Title

Lastly, we’ll create a left adjusted title and subtitle. Making a modern looking title is not very intuitive when using base R. It’s best to use mtext() in this context.

circlize::circos.clear()
circlize::chordDiagram(
 # your edge list
 unref,
 # named grouping vector
 group = group.labs,
 # named vector of country/sector colors
 grid.col = country.cols,
 # plot largest flows last
 link.zindex = rank(unref$value),
 # link alpha
 transparency = 0.05,
 # space between countries/sectors
 small.gap = 5,
 # space between regions/groups
 big.gap = 10,
 # remove border
 grid.border = NULL,
 # assign direction: from > to
 directional = 1,
 # set nature of outgoing link style
 direction.type = c("diffHeight", "arrows"),
 # set link end to a large arrow
 link.arr.type = "big.arrow",
 # remove the proportional bar under sender outgoing indicating who's receiving
 link.target.prop = FALSE,
 # each track can have sub-tracks for the "grid" (main color/block for
 # country), the "name" (another sub track just outside grid with
 # sector/country names), and the "axis" (tick marks). I suppress tick
 # marks and names - we make custom sector/country labels in next step
 annotationTrack = "grid",
 # lastly, pre allocate the tracks and their heights. First element is track 1
 # (the regions), and track 2 are the countries
 preAllocateTracks = list(list(track.height = 0.085),
 list(track.height = 0.01))
)

# now set up country track names
# the final plot aesthetics are dependent upon the combination of text size,
# text positioning, link border thickness settings, and the graphical device
# output size. This all renders nicely output to pdf ~10x10in using the RStudio
# viewer pdf output
circlize::circos.track(
 # specify inner/sector/country track
 track.index = 2,
 panel.fun = function(x, y) {
 # get x location
 xlim = circlize::get.cell.meta.data("xlim")
 # get y location
 ylim = circlize::get.cell.meta.data("ylim")
 # pull sector info
 sector.index = circlize::get.cell.meta.data("sector.index")
 # set up the text
 circlize::circos.text(
 # x location for placement
 mean(xlim),
 # y location for placement
 mean(ylim),
 # text color
 col = "#323232",
 # font size (bold)
 font = 1,
 # text size
 cex = 0.6,
 # the actual text (sector name)
 sector.index,
 # text orientation
 facing = "outside",
 # beautify, only works with facing != NULL
 niceFacing = TRUE,
 # text position adjustment
 adj = c(0.5, 0.5)
 )
 },
 # remove background border
 bg.border = NA
)

# Loop through the country regions to create the outer track loop through unique
# region values in the named vector sent to group in circlize::chordDiagram()
for(region in unique(group.labs)) {
 # extract the sectors/countries belonging to each group in the named vector
 countries = names(group.labs[group.labs == region])

 # setup the highlight in the group/region track
 circlize::highlight.sector(
 # the countries/sectors belonging to the current region
 countries,
 # specify the track we're working with (1=regions, 2=countries)
 track.index = 1,
 # color of the region track
 col = "#939798",
 # tect for the region track
 text = region,
 # size of the text
 cex = 0.8,
 # font of the text
 font = 1,
 # color of the text
 text.col = "white",
 # adjust vertical placement of the text
 text.vjust = 0.55,
 # style of font
 facing = "bending.outside",
 # make it pretty as you move around the circle
 niceFacing = TRUE
 )}

mytitle = "Cumulative Dyadic Refugee Flows"
mysubtitle = "For 2000-2019 UNHCR Population of Concern Refuggee Data"

mtext(side=3, line=-1, adj=0, cex=1, font = 2, mytitle, outer = TRUE)
mtext(side=3, line=-1.75, adj=0, cex=0.8, mysubtitle, outer = TRUE)

Author Edit: Excuse my typo (*Refugee) in the subtitle


This concludes the vignette. Be sure to check out the circlize reference manual for even more potential customizations, and the rest of our DANTE Vignettes for other environment-security focused data processing and visualization tutorials.

References

1.
International Monetary Fund. Direction of Trade Statistics (DOTS). http://data.imf.org/?sk=9D6028D4-F14A-464C-A2F2-59B2CD424B85 (2018).
2.
Sundberg, R., Eck, K. & Kreutz, J. Introducing the UCDP Non- State Conflict Dataset. Journal of Peace Research 49, 351–362 (2012).
3.
Maoz, Z., Johnson, P. L., Kaplan, J., Ogunkoya, F. & Shreve, A. P. The Dyadic Militarized Interstate Disputes ( MIDs) Dataset Version 3.0: Logic, Characteristics, and Comparisons to Alternative Datasets. Journal of Conflict Resolution 0022002718784158 (2018) doi: 10.1177/0022002718784158.
4.
Boschee, E. et al. ICEWS Coded Event Data. (2018) doi: 10.7910/DVN/28075.
5.
Leetaru, K. & Schrodt, P. A. GDELT: Global data on events, location, and tone. ISA Annual Convention (2013).
6.
The United Nations. Trends in International Migrant Stock: The 2015 Revision. http://www.un.org/en/development/desa/population/migration/data/estimates2/docs/MigrationStockDocumentation_2015.pdf (2015).
7.
U.S. Department of Agriculture. USDA ERS - U. S. Agricultural Trade Data Update. https://www.ers.usda.gov/data-products/foreign-agricultural-trade-of-the-united-states-fatus/us-agricultural-trade-data-update/ (2021).
8.
Brors, B., Gu, L., Schlesner, M., Eils, R. & Gu, Z. Circlize Implements and Enhances Circular Visualization in R. Bioinformatics 30, 2811–2812 (2014).
 

Add new comment

Plain text

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>