Skip to main content Skip to navigation

Excel files in R

This page is Optional Information only.

On the previous page, loading data, we recommended turning Microsoft Excel spreadsheets into "Tab Separated Variable" plain text files.

If you are using Windows, then the optional package xlsReadWrite will let you read and write Excel spreadsheets directly. You can install this package from R's "Packages", "Install package(s)..." menu.

If you are using Linux or Mac OS X then this package is not available.
Sadly xlsReadWrite is for Windows ONLY.

When I wrote this page there were a few "catches" (see below), which have been fixed in xlsReadWrite v1.3.3 (following my suggestions to the author). I would still suggest you may prefer to keep things simple, and work with plain text input files.

Loading Excel Spreadsheets

Here is the Santa Claus data introduced on the previous page:

You can load the "Tab Separated Variable" plain text file with read.table() command (either specify the full path, or set the working directory first using the setwd() command or the menu entry "File", "Change dir..."):

> Santa <-read.table("IsThereASantaClaus.txt", sep="\t", header=TRUE)
> summary(Santa)
  Believe             Age           Gender      Presents      Behaviour 
 Mode :logical   Min.   : 4.00   female:23   Min.   : 3.0   naughty:26  
 FALSE:25        1st Qu.: 5.00   male  :27   1st Qu.:20.0   nice   :24  
 TRUE :25        Median : 7.00               Median :26.5               
                 Mean   : 6.86               Mean   :27.0               
                 3rd Qu.: 9.00               3rd Qu.:33.5               
                 Max.   :10.00               Max.   :57.0  
> class(Santa$Gender)
[1] "factor"

If you have installed the xlsReadWrite package, you can load the Microsoft Excel file with read.xls() command (again, use the correct path for your machine):

> library(xlsReadWrite)
> Santa2 <-read.xls("IsThereASantaClaus.xls", colNames=TRUE)
> summary(Santa2)
  Believe             Age           Gender             Presents     Behaviour        
 Mode :logical   Min.   : 4.00   Length:50          Min.   : 3.0   Length:50         
 FALSE:25        1st Qu.: 5.00   Class :character   1st Qu.:20.0   Class :character  
 TRUE :25        Median : 7.00   Mode  :character   Median :26.5   Mode  :character  
                 Mean   : 6.86                      Mean   :27.0                     
                 3rd Qu.: 9.00                      3rd Qu.:33.5                     
                 Max.   :10.00                      Max.   :57.0                     
> class(Santa2$Gender)
[1] "character"

Column Classes

Notice that using read.table() the Gender and Behaviour columns were treated as factors. Using read.xls() from older versions of xlsReadWrite, they are treated as characters (strings). We can fix this by hand:

> Santa2$Gender <- factor(Santa2$Gender)
> Santa2$Bahaviour <- factor(Santa2$Behaviour)
> summary(Santa2)
  Believe             Age           Gender             Presents     Behaviour        
 Mode :logical   Min.   : 4.00   Length:50          Min.   : 3.0   Length:50         
 FALSE:25        1st Qu.: 5.00   Class :character   1st Qu.:20.0   Class :character  
 TRUE :25        Median : 7.00   Mode  :character   Median :26.5   Mode  :character  
                 Mean   : 6.86                      Mean   :27.0                     
                 3rd Qu.: 9.00                      3rd Qu.:33.5                     
                 Max.   :10.00                      Max.   :57.0 

The alternative to letting R or xlsReadWrite guess the column formats, is to use the colClasses argument to these functions to specify explicitly how you want the columns treated.

> Santa3 <-read.table("IsThereASantaClaus.txt", header=TRUE, sep="\t",
+ colClasses=c('logical', 'integer', 'factor', 'integer', 'factor'))

Or,

> Santa4 <-read.xls("IsThereASantaClaus.xls", colNames=TRUE,
+ colClasses=c('logical', 'integer', 'factor', 'integer', 'factor'))

Multiple Worksheets

In the Santa spreadsheet there is only one datasheet, but this is not always the case. The read.xls() function takes an optional sheet argument (which defaults to one) allowing you to choose which worksheet to load.

Using the wheat example from the 2006/2007 assignment, you might read the text file like this:

> wheat <- read.table("wheat.txt", header=TRUE, sep="\t",
+ colClasses=c("factor","integer","double"))

Or, you could load "worksheet six" from the Microsoft Excel spreadsheet:

> wheat2 <- read.xls("MOAC_CH923_Data_Analysis.xls", colNames=TRUE, sheet=6,
+ colClasses=c("factor","integer","double"))

Note that loading the Excel spreadsheet gives slightly different variable names, which you can check with the colnames() or summary() commands... spaces in R should be avoided, but xlsReadWrite prior to v1.3.3 did not follow this practice.