Computational Ecology: 2010-08

2010-08-31

Introduction to GDAL

The geospatial data abstraction library (GDAL, www.gdal.org) is an open source library for translating geospatial data between different formats. It is the primary intermediary by which all open-source data analysis and GIS software is able to interact with ArcGIS datasets. For example, one can translate ArcGIS binary grids into GRASS rasters or import them as an SpatialGridDataFrame object in R. GDAL proper supports raster data types, but it has been effectively merged with another library, OGR, that supports vector data conversion.

There are some file formats that are not directly translatable by GDAL, notably ESRI proprietary Smart Data Compression (SDC) files. The GDAL website provides a list of supported vector formats and raster formats. ESRI binary grids, coverages, and personal geodatabases can be read but not written.

Since GDAL is a library, it is meant for developers who write code that reference the library, rather than for end users who actually translate datasets. The library API is in C, but there are bindings for R, Perl, Python, VB6, Ruby, Java and C#. The GDAL package does come with a few commandline utilities for working with geospatial data: GDAL Utilites. Among those tools that I commonly use

gdalinfo: Raster information
gdal_translate.py: Convert rasters between formats
gdal_merge.py: Merge Tiled rasters into one
gdal2tiles.py: Makes a Google Maps- or Google Earth-compatible set of rasters

Binary distributions of GDAL are available for most platforms are available here, and can be installed without using the command line.

In the next post, we'll look at the R bindings for GDAL in the RGDAL package built by Tim Keitt at the University of Texas, and I will show some examples of working with raster datasets in R.

Labels: GIS

2010-08-11

Converting R contingency tables to data frames

A contingency table presents the joint density of one or more categorical variables. Each entry in a contingency table is a count of the number of times a particular set of factors levels occurs in the dataset. For example, consider a list of plant species where each species is assigned a relative seed size (small, medium, or large) and a growth form (tree, shrub, or herb).

seed.sizes <- c("small", "medium", "large")
growth.forms <- c("tree", "shrub", "herb")
species.traits <- data.frame(
  seed.size = seed.sizes[c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3)],
  growth.form = growth.forms[c(3, 3, 2, 2, 1, 2, 2, 3, 1, 1, 1, 1)]
)


seed.size	growth.form
small	herb
small	herb
small	shrub
small	shrub
small	tree
medium	shrub
medium	shrub
medium	herb
medium	tree
large	tree
large	tree
large	tree

A contingency table will tell us how many times each combination of seeds.sizes and growth.forms occur.

tbl <- table(species.traits)


herb	shrub	tree
0	0	3
1	2	1
2	2	1

The output contingency table are of class table. The behaviour of these objects is not quite like a data frame. In fact, trying to convert them to a data frame gives a non-intuitive result.

as.data.frame(tbl)


seed.size	growth.form	Freq
large	herb	0
medium	herb	1
small	herb	2
large	shrub	0
medium	shrub	2
small	shrub	2
large	tree	3
medium	tree	1
small	tree	1

Coercion of the table into a data frame puts each factor of the contingency table into its own column along with the frequency, rather than keeping the same structure as original table object. If we wanted to turn the table into a data frame keeping the original structure we use as.data.frame.matrix. This function is not well-documented in R, and this is probably the only situation in which it would be used. But, it works.

as.data.frame.matrix(tbl)


herb	shrub	tree
0	0	3
1	2	1
2	2	1

Labels: R

Computational Ecology

2010-08-31

Introduction to GDAL

2010-08-11

Converting R contingency tables to data frames

About Me

Links

Previous Posts

Archives