Shape files use a format designed for Arc Info (ESRI) whose sole purpose is to store and manage geographic data. They hold coordinates, just like R data.frames, but more information as well. Coordinates are organized into polygons (or lines), and additional data about those features can be included. We start with a sample shape database called TM_WORLD_BORDERS-0.3. There are four different files which you need, all with the same name but four different extensions (you don't need the readme file). R will treat them as one. Download all four into the same folder; you can save them anywhere, then use setwd to the folder. My copy of TM_WORLD_BORDERS-0.3 is in ~/data/gis/World.
You need sp and maptools packages to read these shape files. Then the function readShapePoly (package maptools) simply reads the shape file, using the main name. Save it to an R object, here named WorldGIS.
library(sp) library(maptools) worldGIS=readShapePoly('TM_WORLD_BORDERS-0.3') class(worldGIS)
The resulting object, worldGIS, is a spatial data type called SpatialPolygonsDataFrame. This is a very complicated data type, difficult to understand in detail. It has map coordinates for 246 countries, and each country has multiple polygons. These spatial objects are advanced classes in R, built with slots and lists.
There are five slots in worldGIS. The bbox is small but useful -- the range of x and y coordinates. The data slot is a simple data.frame of 246 countries, with names and other information. Notice that slots within a class are accessed with the symbol @.
slotNames(worldGIS)
[1] "data" "polygons" "plotOrder" "bbox" "proj4string"
head(worldGIS@data)
FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON LAT 0 AC AG ATG 28 Antigua and Barbuda 44 83039 19 29 -61.783 17.078 1 AG DZ DZA 12 Algeria 238174 32854159 2 15 2.632 28.163 2 AJ AZ AZE 31 Azerbaijan 8260 8352021 142 145 47.395 40.430 3 AL AL ALB 8 Albania 2740 3153731 150 39 20.068 41.143 4 AM AM ARM 51 Armenia 2820 3017661 142 145 44.563 40.534 5 AO AO AGO 24 Angola 124670 16095214 2 17 17.544 -12.296
worldGIS@bbox
min max x -180 180.0000 y -90 83.6236
The slot called polygons is a list, and it has a length. There are 246, one for each country. Each belongs to another class of spatial object called Polygons.
length(worldGIS@polygons)
[1] 246
class(worldGIS@polygons[[1]])
[1] "Polygons" attr(,"package") [1] "sp"
str(worldGIS@polygons[[1]])
Formal class 'Polygons' [package "sp"] with 5 slots ..@ Polygons :List of 2 .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots .. .. .. ..@ labpt : num [1:2] -61.8 17.1 .. .. .. ..@ area : num 0.0291 .. .. .. ..@ hole : logi FALSE .. .. .. ..@ ringDir: int 1 .. .. .. ..@ coords : num [1:23, 1:2] -61.7 -61.7 -61.8 -61.9 -61.9 ... .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots .. .. .. ..@ labpt : num [1:2] -61.8 17.6 .. .. .. ..@ area : num 0.0171 .. .. .. ..@ hole : logi FALSE .. .. .. ..@ ringDir: int 1 .. .. .. ..@ coords : num [1:25, 1:2] -61.7 -61.7 -61.7 -61.7 -61.8 ... ..@ plotOrder: int [1:2] 1 2 ..@ labpt : num [1:2] -61.8 17.1 ..@ ID : chr "0" ..@ area : num 0.0462
slotNames(worldGIS@polygons[[1]])
[1] "Polygons" "plotOrder" "labpt" "ID" "area"
One of those Polygons (capital-P) is yet another list. This first, for country 1, is a list of two. So the map of Antigua and Barbuda has two objects called Polygon, and each of those really is a polygon, with coordinates defining the boundary.
length(worldGIS@polygons[[1]]@Polygons)
[1] 2
str(worldGIS@polygons[[1]]@Polygons)
List of 2 $ :Formal class 'Polygon' [package "sp"] with 5 slots .. ..@ labpt : num [1:2] -61.8 17.1 .. ..@ area : num 0.0291 .. ..@ hole : logi FALSE .. ..@ ringDir: int 1 .. ..@ coords : num [1:23, 1:2] -61.7 -61.7 -61.8 -61.9 -61.9 ... $ :Formal class 'Polygon' [package "sp"] with 5 slots .. ..@ labpt : num [1:2] -61.8 17.6 .. ..@ area : num 0.0171 .. ..@ hole : logi FALSE .. ..@ ringDir: int 1 .. ..@ coords : num [1:25, 1:2] -61.7 -61.7 -61.7 -61.7 -61.8 ...
head(worldGIS@polygons[[1]]@Polygons[[1]]@coords)
[,1] [,2] [1,] -61.68667 17.02444 [2,] -61.73806 16.98972 [3,] -61.82917 16.99694 [4,] -61.87611 17.01694 [5,] -61.88056 17.01972 [6,] -61.88361 17.02361
Determine that Panama is row 165 in worldGIS@data. Panama has 17 Polygons. The last, number 17, is the biggest.
which(worldGIS@data$NAME=='Panama')
[1] 165
length(worldGIS@polygons[[165]]@Polygons)
[1] 17
dim(worldGIS@polygons[[165]]@Polygons[[17]]@coords)
[1] 716 2
Keeping track of the detailed structure of the SpatialPolygonsDataFrame is tedious, but fortunately the basic mapping tools do not require knowing the details. Simply use plot and subset to come up with basic graphs.
sp::plot(worldGIS)
sp::plot(subset(worldGIS,NAME=='Panama'))
CA=subset(worldGIS,NAME=='Panama' | NAME=='Costa Rica' | NAME=='Nicaragua' | NAME=='Honduras' | NAME=='Guatemala') sp::plot(CA)
bbox(CA)
min max x -92.246780 -77.19833 y 7.206111 17.82111
CA@bbox
min max x -92.246780 -77.19833 y 7.206111 17.82111
CA@data
FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON LAT 38 CS CR CRI 188 Costa Rica 5106 4327228 19 13 -83.946 9.971 74 GT GT GTM 320 Guatemala 10843 12709564 19 13 -90.398 15.256 78 HO HN HND 340 Honduras 11189 683411 19 13 -86.863 14.819 158 NU NI NIC 558 Nicaragua 12140 5462539 19 13 -85.034 12.840 164 PM PA PAN 591 Panama 7443 3231502 19 13 -80.920 8.384
One way to understand the SpatialPolgonsDataFrame is to build one a step at a time from the bottom. There are functions in the sp package for creating each of the four spatial objects needed. Here I start with four sample data.frames available at the Sample Data page. There were saved as an rdata object, and I use the R function load for getting it. There are NOT ascii text, so read.delim will not work.
attach('practiceChicagoMap.rdata')
ls(2) head(ChiForest[[1]]) str(ChiForest[[1]])
Each of the four objects is a simple data.frame of coordinates, just like the ones we used for simple maps at the start of the workshop. There are two columns of x-y coordinates, and nothing else. The function Polygon converts them into a special GIS Class called Polygon.
The Polygon class has slots to store coordinates, the bbox, and the area, which it just calculated.for1poly=Polygon(ChiForest[[1]]) for2poly=Polygon(ChiForest[[2]]) park1poly=Polygon(ChiPark[[1]]) park2poly=Polygon(ChiPark[[2]]) class(for1poly)
[1] "Polygon" attr(,"package") [1] "sp"
slotNames(for1poly)
[1] "labpt" "area" "hole" "ringDir" "coords"
for1poly@area
[1] 0.0001777001
head(for1poly@coords)
X Y 1 -87.84664 41.97128 2 -87.84664 41.97123 3 -87.84665 41.97028 4 -87.84665 41.96951 5 -87.84666 41.96887 6 -87.84667 41.96825
The two polygons called forpoly are part of the Chicago forest map, so it now makes sense to combine them into a bigger object. The Class called Polygons does this.
forpolys=Polygons(list(for1poly,for2poly),ID='Forest') parkpolys=Polygons(list(park1poly,park2poly),ID='Park') class(forpolys)
[1] "Polygons" attr(,"package") [1] "sp"
length(forpolys@Polygons)
[1] 2
slotNames(forpolys@Polygons[[1]])
[1] "labpt" "area" "hole" "ringDir" "coords"
This is probably familiar now, since it is what the WorldGIS database built by readShapePoly looks like. There are two components in this mini-database, each with two polygons. The next step is to join those two countries into a large object called a SpatialPolygon Class. Finally this produces an object which can be mapped with the plot function.
forpark=SpatialPolygons(list(forpolys,parkpolys)) class(forpark)
[1] "SpatialPolygons" attr(,"package") [1] "sp"
slotNames(forpark)
[1] "polygons" "plotOrder" "bbox" "proj4string"
sp::plot(forpark)
But it still lacking the data slot which the WorldGIS has. This comes in the next, and final, step, creating the SpatialPolygonDataFrame object. The data.frame must be constructed, and can include any columns. But there must be two rows, and those rows must have row.names matching the IDs assigned to the Polygons Classes. This is the same class as WorldGIS, and has all five slots.
datafile=data.frame(row.names=c('Forest','Park'),size=c(2,2)) forparkdf=SpatialPolygonsDataFrame(forpark,datafile) class(forparkdf)
[1] "SpatialPolygonsDataFrame" attr(,"package") [1] "sp"
slotNames(forparkdf)
[1] "data" "polygons" "plotOrder" "bbox" "proj4string"
The Chicago GIS packages have a different projection (about which I know little). Converting to lat-long requires an inconvenient project name which must be copied exactly.
park=readShapePoly('Parks_Aug2012')
IllProjStr="+proj=tmerc +lat_0=36.66666666666666 +lon_0=-88.33333333333333 +k=0.9999749999999999 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs" IllProj=CRS(IllProjStr) park@proj4string=IllProj parkLL=spTransform(park,CRS('+proj=longlat')) sp::plot(parkLL)
parkLL@bbox
min max x -87.83901 -87.52444 y 41.65142 42.02299