Skip to main content Skip to navigation

Lab Session 0

  1. Open a new script file to work in, save it as a .R file. (Just give it any file name you want, ending in .R)

  2. Download duncan.csv from the course website (right-click the link then choose from the context menu to save). In RStudio, click on the Import Dataset button on the Environment tab in the top-right pane. Select From CSV… and select the duncan.csv file to import.

    Copy the code from the Code Preview window, before clicking the Import button to import the data. A new tab will open with a preview of the data, but before looking at this, paste your copied code into your script file.

    The data are from the 1971 census of Canada, with the following variables

    occupation

    named occupation

    education

    average education in years

    income

    average income in dollars

    women

    percentage women in occupation

    prestige

    prestige score for occupation, from earlier survey

    census

    Canadian census occupation code

    type

    type of occupation: bc (blue collar), prof (professional, mangerial and technical) and wc (white collar)

  3. Use summary to get a quick summary of the variables (unless you chose a different name in the Import dialog, the data frame will be called duncan). You will see that type has been read in as a character vector. Convert this to a factor using the code below

    duncan$type <- factor(duncan$type)      
  4. Use summary to get an updated summary of the data. You will see that type has missing values.

    A subset of the data can be obtained with a command of the form

    subset(duncan, condition)      

    where condition is a logical vector, which is TRUE for the rows that should be kept and FALSE otherwise.

    The command is.na(x) will return a logical vector indicating whether each element of x is NA (TRUE) or not (FALSE). Use such a logical vector to obtain the subset of the data for which type is NA - from this you can see which occupations are unclassified.

  5. Using min, assign the minimum proportion of women to a name. The command x == a will return a logical vector indicating whether each element of x is equal to a (TRUE) or not (FALSE). Use subset to obtain the rows of data where the proportion of women is equal to the minimum value. Repeat the process to obtain rows corresponding to the maximum.

  6. Create a histogram and then a density plot of the prestige variable and compare the output. Use the code completion tools in RStudio to look at the second argument of hist and try modifying.

  7. Create a boxplot of years of education by occupation type.

  8. Create a scatterplot of prestige against income. Look at the help for log and then create a plot of prestige against log income with base 10.