R GIS

Shape files

Shape files use a format designed for Arc Info (ESRI) whose sole purpose is to store and manage geographic data. They hold coordinates, just like R data.frames, but more information as well. Coordinates are organized into polygons (or lines), and additional data about those features can be included. We start with a sample shape database called TM_WORLD_BORDERS-0.3. There are four different files which you need, all with the same name but four different extensions (you don't need the readme file). R will treat them as one. Download all four into the same folder; you can save them anywhere, then use setwd to the folder. My copy of TM_WORLD_BORDERS-0.3 is in ~/data/gis/World.

You need sp and maptools packages to read these shape files. Then the function readShapePoly (package maptools) simply reads the shape file, using the main name. Save it to an R object, here named WorldGIS.

library(sp)
library(maptools)
worldGIS=readShapePoly('TM_WORLD_BORDERS-0.3')
class(worldGIS)

The resulting object, worldGIS, is a spatial data type called SpatialPolygonsDataFrame. This is a very complicated data type, difficult to understand in detail. It has map coordinates for 246 countries, and each country has multiple polygons. These spatial objects are advanced classes in R, built with slots and lists.

There are five slots in worldGIS. The bbox is small but useful -- the range of x and y coordinates. The data slot is a simple data.frame of 246 countries, with names and other information. Notice that slots within a class are accessed with the symbol @.

slotNames(worldGIS)
[1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"
head(worldGIS@data)
  FIPS ISO2 ISO3 UN                NAME   AREA  POP2005 REGION SUBREGION     LON     LAT
0   AC   AG  ATG 28 Antigua and Barbuda     44    83039     19        29 -61.783  17.078
1   AG   DZ  DZA 12             Algeria 238174 32854159      2        15   2.632  28.163
2   AJ   AZ  AZE 31          Azerbaijan   8260  8352021    142       145  47.395  40.430
3   AL   AL  ALB  8             Albania   2740  3153731    150        39  20.068  41.143
4   AM   AM  ARM 51             Armenia   2820  3017661    142       145  44.563  40.534
5   AO   AO  AGO 24              Angola 124670 16095214      2        17  17.544 -12.296
worldGIS@bbox
   min      max
x -180 180.0000
y  -90  83.6236

The slot called polygons is a list, and it has a length. There are 246, one for each country. Each belongs to another class of spatial object called Polygons.

length(worldGIS@polygons)
[1] 246
class(worldGIS@polygons[[1]])
[1] "Polygons"
attr(,"package")
[1] "sp"
str(worldGIS@polygons[[1]])
Formal class 'Polygons' [package "sp"] with 5 slots
  ..@ Polygons :List of 2
  .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. .. .. ..@ labpt  : num [1:2] -61.8 17.1
  .. .. .. ..@ area   : num 0.0291
  .. .. .. ..@ hole   : logi FALSE
  .. .. .. ..@ ringDir: int 1
  .. .. .. ..@ coords : num [1:23, 1:2] -61.7 -61.7 -61.8 -61.9 -61.9 ...
  .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. .. .. ..@ labpt  : num [1:2] -61.8 17.6
  .. .. .. ..@ area   : num 0.0171
  .. .. .. ..@ hole   : logi FALSE
  .. .. .. ..@ ringDir: int 1
  .. .. .. ..@ coords : num [1:25, 1:2] -61.7 -61.7 -61.7 -61.7 -61.8 ...
  ..@ plotOrder: int [1:2] 1 2
  ..@ labpt    : num [1:2] -61.8 17.1
  ..@ ID       : chr "0"
  ..@ area     : num 0.0462
slotNames(worldGIS@polygons[[1]])
[1] "Polygons"  "plotOrder" "labpt"     "ID"        "area"     

One of those Polygons (capital-P) is yet another list. This first, for country 1, is a list of two. So the map of Antigua and Barbuda has two objects called Polygon, and each of those really is a polygon, with coordinates defining the boundary.

length(worldGIS@polygons[[1]]@Polygons)
[1] 2
str(worldGIS@polygons[[1]]@Polygons)
List of 2
 $ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. ..@ labpt  : num [1:2] -61.8 17.1
  .. ..@ area   : num 0.0291
  .. ..@ hole   : logi FALSE
  .. ..@ ringDir: int 1
  .. ..@ coords : num [1:23, 1:2] -61.7 -61.7 -61.8 -61.9 -61.9 ...
 $ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. ..@ labpt  : num [1:2] -61.8 17.6
  .. ..@ area   : num 0.0171
  .. ..@ hole   : logi FALSE
  .. ..@ ringDir: int 1
  .. ..@ coords : num [1:25, 1:2] -61.7 -61.7 -61.7 -61.7 -61.8 ...
head(worldGIS@polygons[[1]]@Polygons[[1]]@coords)
          [,1]     [,2]
[1,] -61.68667 17.02444
[2,] -61.73806 16.98972
[3,] -61.82917 16.99694
[4,] -61.87611 17.01694
[5,] -61.88056 17.01972
[6,] -61.88361 17.02361

Determine that Panama is row 165 in worldGIS@data. Panama has 17 Polygons. The last, number 17, is the biggest.

which(worldGIS@data$NAME=='Panama')
[1] 165
length(worldGIS@polygons[[165]]@Polygons)
[1] 17
dim(worldGIS@polygons[[165]]@Polygons[[17]]@coords)
[1] 716   2

Keeping track of the detailed structure of the SpatialPolygonsDataFrame is tedious, but fortunately the basic mapping tools do not require knowing the details. Simply use plot and subset to come up with basic graphs.

sp::plot(worldGIS)
plot of chunk subsetpoly
sp::plot(subset(worldGIS,NAME=='Panama'))
plot of chunk subsetpoly
CA=subset(worldGIS,NAME=='Panama' | NAME=='Costa Rica' | NAME=='Nicaragua' | NAME=='Honduras' | NAME=='Guatemala')
sp::plot(CA)
plot of chunk subsetpoly
bbox(CA)
         min       max
x -92.246780 -77.19833
y   7.206111  17.82111
CA@bbox
         min       max
x -92.246780 -77.19833
y   7.206111  17.82111
CA@data
    FIPS ISO2 ISO3  UN       NAME  AREA  POP2005 REGION SUBREGION     LON    LAT
38    CS   CR  CRI 188 Costa Rica  5106  4327228     19        13 -83.946  9.971
74    GT   GT  GTM 320  Guatemala 10843 12709564     19        13 -90.398 15.256
78    HO   HN  HND 340   Honduras 11189   683411     19        13 -86.863 14.819
158   NU   NI  NIC 558  Nicaragua 12140  5462539     19        13 -85.034 12.840
164   PM   PA  PAN 591     Panama  7443  3231502     19        13 -80.920  8.384

One way to understand the SpatialPolgonsDataFrame is to build one a step at a time from the bottom. There are functions in the sp package for creating each of the four spatial objects needed. Here I start with four sample data.frames available at the Sample Data page. There were saved as an rdata object, and I use the R function load for getting it. There are NOT ascii text, so read.delim will not work.

attach('practiceChicagoMap.rdata')
ls(2)
head(ChiForest[[1]])
str(ChiForest[[1]])

Each of the four objects is a simple data.frame of coordinates, just like the ones we used for simple maps at the start of the workshop. There are two columns of x-y coordinates, and nothing else. The function Polygon converts them into a special GIS Class called Polygon.

The Polygon class has slots to store coordinates, the bbox, and the area, which it just calculated.
for1poly=Polygon(ChiForest[[1]])
for2poly=Polygon(ChiForest[[2]])
park1poly=Polygon(ChiPark[[1]])
park2poly=Polygon(ChiPark[[2]])
class(for1poly)
[1] "Polygon"
attr(,"package")
[1] "sp"
slotNames(for1poly)
[1] "labpt"   "area"    "hole"    "ringDir" "coords" 
for1poly@area
[1] 0.0001777001
head(for1poly@coords)
          X        Y
1 -87.84664 41.97128
2 -87.84664 41.97123
3 -87.84665 41.97028
4 -87.84665 41.96951
5 -87.84666 41.96887
6 -87.84667 41.96825

The two polygons called forpoly are part of the Chicago forest map, so it now makes sense to combine them into a bigger object. The Class called Polygons does this.

forpolys=Polygons(list(for1poly,for2poly),ID='Forest')
parkpolys=Polygons(list(park1poly,park2poly),ID='Park')
class(forpolys)
[1] "Polygons"
attr(,"package")
[1] "sp"
length(forpolys@Polygons)
[1] 2
slotNames(forpolys@Polygons[[1]])
[1] "labpt"   "area"    "hole"    "ringDir" "coords" 

This is probably familiar now, since it is what the WorldGIS database built by readShapePoly looks like. There are two components in this mini-database, each with two polygons. The next step is to join those two countries into a large object called a SpatialPolygon Class. Finally this produces an object which can be mapped with the plot function.

forpark=SpatialPolygons(list(forpolys,parkpolys))
class(forpark)
[1] "SpatialPolygons"
attr(,"package")
[1] "sp"
slotNames(forpark)
[1] "polygons"    "plotOrder"   "bbox"        "proj4string"
sp::plot(forpark)
plot of chunk buildpoly3

But it still lacking the data slot which the WorldGIS has. This comes in the next, and final, step, creating the SpatialPolygonDataFrame object. The data.frame must be constructed, and can include any columns. But there must be two rows, and those rows must have row.names matching the IDs assigned to the Polygons Classes. This is the same class as WorldGIS, and has all five slots.

datafile=data.frame(row.names=c('Forest','Park'),size=c(2,2))
forparkdf=SpatialPolygonsDataFrame(forpark,datafile)
class(forparkdf)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
slotNames(forparkdf)
[1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"

Converting Chicago GIS coordinates to Lat-Long

The Chicago GIS packages have a different projection (about which I know little). Converting to lat-long requires an inconvenient project name which must be copied exactly.

park=readShapePoly('Parks_Aug2012')
IllProjStr="+proj=tmerc +lat_0=36.66666666666666 +lon_0=-88.33333333333333 +k=0.9999749999999999 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs"
IllProj=CRS(IllProjStr)
park@proj4string=IllProj
parkLL=spTransform(park,CRS('+proj=longlat'))
sp::plot(parkLL)
plot of chunk proj
parkLL@bbox
        min       max
x -87.83901 -87.52444
y  41.65142  42.02299