Skip to main content Skip to navigation

Lab session 0

  1. Download duncan.txt from the course website. Then, in RStudio, click on the Import Dataset button on the Environment tab in the top-right pane. Select From Local File and select the duncan.txt file to import and view the data.

    The data are from the 1971 census of Canada, with the following variables


    named occupation


    average education in years


    average income in dollars


    percentage women in occupation


    prestige score for occupation, from earlier survey


    Canadian census occupation code


    type of occupation: bc (blue collar), prof (professional, mangerial and technical) and wc (white collar)

    Open a new script file to work in, save it as a .R file. (Just give it any file name you want, ending in .R)

  2. Use summary to get a quick summary of the variables. You will see that type has missing values. The command will return a logical vector indicating whether each element of x is NA (TRUE) or not (FALSE). Look up the help for the function subset and find out how to use such a logical vector to obtain the subset of the data for which type is NA. In summaries involving type these observations will be silently excluded.

  3. Using min, assign the minimum proportion of women to a name. The command x == a will return a logical vector indicating whether each element of x is equal to a (TRUE) or not (FALSE). Use subset to obtain the rows of data where the proportion of women is equal to the minimum value. Repeat the process to obtain rows corresponding to the maximum.

  4. Create a histogram and then a density plot of the prestige variable and compare the output. Use the code completion tools in RStudio to look at the second argument of hist and try modifying.

  5. Create a boxplot of years of education by occupation type.

  6. Create a scatterplot of prestige against income. Look at the help for log and then create a plot of prestige against log income with base 10.