Introduction to R programming 1: Preliminaries

⇒ Front page of the course

Back to top

1 Get helps in R

  • To get help on a function or a dataset:
    ?function_name <==> help(function_name)
    For example, ?mean is equivalent to help(mean).
  • To find functions using a keyword
    ??keyword <==>
    Note: when the search term consists of multiple words, it shall be quoted.
    For example, ??plotting is equivalent to
    ??"nonlinear regression" is equivalent to"nonlinear regression")
  • The apropos(keyword) function can be used to find variables and functions containing the keyword and available for your usage, including the ones you defined.
    > my_vector  apropos("vector")
    [1] ".__C__vector"         "" "as.vector"            "as.vector.factor"    
    [5] "is.vector"            "my_vector"            "vector"               "Vectorize"

    A fancier usage of apropos() function is to do matching with regular expressions.

    > apropos("z$")
    [1] ".rs.disableQuartz" "SSgompertz"        "toeplitz"          "unz"   
  • Use example() function to hearn a function by examples
    For example: example(plot)
  • Use demo() function to get demonstrations for some concepts. For example demo() lists all demonstrations, and demo(colors) demonstrates R’s predefined colors() function.
  • Use the browseVignettes() function to view HTML materials for the packages installed on local machine, or use the vignette() function to view specific HTML material.
  • The RSiteSearch() function can be used to run a query at Official Website of R Project to look for any package: RSiteSearch("nonlinear regression")

Back to top

2 Some general concepts

  • R language is case sensitive.
  • R names consist of alphanumeric symbols, plus '.' and '_', with a restriction that a name must start with '.' or a letter.
  • Two kinds of basic R commands: (1) expressions (evalue, then print, then value is lost); (2) assignment (evaluate, store the value, but not print).
  • Multiple commands can co-exist in one line, separated by ';'. Nultiple commands can also be grouped together into one compound expression by a pair of braces '{' abd '}'.
  • We can run a R file, say example.R, by the command:
    > source("example.R")
  • By default, R outputs evaluation results to the console. However, the outputs can be re-directed to a file, say output.txt, by the command
    > sink(“output.txt”)
    and such redirection can be stopped to resume normal console output by
    > sink()
  • To get the help of a function, for example, solve(), we can use the commands
    > help(solve) or ?solve
  • The command > objects() returns the names of objects in the workspace under current R session, and the rm() function can be used to remove objects from the workspace
    > rm(x,y,z, temp,foo)
  • When exiting a R session, R prompts to ask whether to save the workspace, meaning that all objects will be saved to a .RData file, and all command lines will be saved to a .Rhistory file.
    Later if R is started from same directory, these history data will be loaded into R session. It is recommended that you should use separate working directory for analyses conducted with R.

Back to top

To install packages

install.packages("package_name") # install directly from internet
install.packages(file.choose()) # pop up a UI to install from local compressed file

There are some software for extending functionalities of R.

  • Under Linux, they can be easily installed by package manager.
  • Under Windows, these software can be installed via the installr package: 

Back to top

4 R as a scientific calculator

  • Under R command line, x does purely assignment, but (x) does both assignment and print out.
  • All the arithmetic operators in R are vectorized: an operator or a function will act on each element of a vector without the need for you to explicitly write a loop.
  • You can use either ‘^’ or ‘**’ for exponentiation, but using ‘^’ is more common.
  • Different divisions
    > (1:10)/3       # floating division
     [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333 2.6666667 3.0000000 3.3333333
    > (1:10) %/% 3   # integer division
     [1] 0 0 1 1 1 2 2 2 3 3
    > (1:10) %% 3    # remainder after division
     [1] 1 2 0 1 2 0 1 2 0 1
  • The functions log1p(x) and expm1(x) calculate more accurately for very small value of x than log(1+x) and exp(x-1), respectively.
  • You can use ‘==’ to check equality between integers, but use this for nonintegers will be problemmatic.
    • The all.equal(x,y) function can be used to check if two objects x and y are (nearly) equal, with tolerance level, by default 1.5e-8, specified.
    • If the objects are different, comparison is still made to some extent, and a report of the differences is returned. To suppress report and enforce TRUE/FALSE output, the isTRUE() function can be used to wrap the all.equal() function.
      > x=sqrt(2)^2
      > x==2
      [1] FALSE
      > all.equal(x,2)
      [1] TRUE
      > all.equal(x,2,tolerance = 1e-20)
      [1] "Mean relative difference: 2.220446e-16"
    • Do not use all.equal() directly in if expressions—either use isTRUE(all.equal(….)) or identical() if appropriate.
      > identical(2,2)
      [1] TRUE
      > identical(sqrt(2)^2,2)
      [1] FALSE
    • Global assignment x <- can alternatively be done by assign(x, 5, globalenv()).

    • Unlike TRUE and FALSE that are reserved by R, T and F are defined, but not reserved, by R. Therefore, it is highly recommended to use TRUE and FALSE, instead of T and F.
      The three vectorized logical operators are ‘!‘, ‘&‘, and ‘|‘, representing notand, and or, respectively. 

      > (x = 5)
      > (y  !x
      > x & y
      > x | y

      The three possible logical values are TRUE, FALSE, and NA.

      The any() and all() functions are useful for dealing with logical vectors.

      [1] TRUE
      > all(some_true)
      [1] FALSE
      > all(none_true)
      [1] FALSE
      > any(all_true)
      [1] TRUE
      > any(some_true)
      [1] TRUE
      > any(none_true)
      [1] FALSE


Back to top

5 Some notes about object classes

  • Type .Machine gives you some information about the properties of R’s numbers.

    > .Machine
    [1] 2.220446e-16
    [1] 1.110223e-16
    [1] 2.225074e-308
    [1] 1.797693e+308
    [1] 2
    [1] 53
    [1] 5
    [1] 0
    [1] -52
    [1] -53
    [1] 11
    [1] -1022
    [1] 1024
    [1] 2147483647
    [1] 4
    [1] 8
    [1] 12
    [1] 4

    Note that R has three classes of numerical values: numeric (for floating), integer, and complex.

    > class(1:10)
    [1] "integer"
    > class(0.5:4.5)
    [1] "numeric"
    > class(sqrt(2))
    [1] "numeric"
    > class(5L)
    [1] "integer"
    > class(1+2i)
    [1] "complex"
  • R does not distinguish single-character and multiple-character strings.
  • Factors are, in essence, integers with labels, since factor values are stored as integers rather than characters, for the purpose of efficient storage in memory.
    Sometimes, in order to manipulate contents of factors, we need to use as.character() to convert factors to strings in order to use string manipulation functions.

    > gender <- factor(c("Male","Female","Male","Female"))
    > gender
    [1] Male   Female Male   Female
    Levels: Female Male
    > nlevels(gender)
    [1] 2
    > as.integer(gender)
    [1] 2 1 2 1
    > as.character(gender)
    [1] "Male"   "Female" "Male"   "Female"
  • To check whether an object is of a specific class, we can equivalently use either of following

    However, the latter way is much more recommended than the former one. We will see that the first way is not as robust as you might imagine.

    • To test a floating number, you shall use is.double() function instead of is.numeric(), since the latter returns TRUE for both integer and floating numbers.

      [1] TRUE
      > is.double(2)
      [1] TRUE
      > is.integer(2)
      [1] FALSE
      > is.numeric(2L)
      [1] TRUE
      > is.double(2L)
      [1] FALSE
      > is.integer(2L)
      [1] TRUE
    • We can list all functions in the base package as following
      > ls(pattern="^is",baseenv())
       [1] "is.array"              "is.atomic"             ""               "is.character"         
       [5] "is.complex"            ""         "is.double"             "is.element"           
       [9] "is.environment"        "is.expression"         "is.factor"             "is.finite"            
      [13] "is.function"           "is.infinite"           "is.integer"            "is.language"          
      [17] "is.list"               "is.loaded"             "is.logical"            "is.matrix"            
      [21] ""                 ""      "" ""        
      [25] "<-"               "<-.default"       "<-.factor"        ""              
      [29] "is.nan"                "is.null"               "is.numeric"            "is.numeric.Date"      
      [33] "is.numeric.difftime"   "is.numeric.POSIXt"     "is.numeric_version"    "is.object"            
      [37] "is.ordered"            "is.package_version"    "is.pairlist"           "is.primitive"         
      [41] "is.qr"                 "is.R"                  "is.raw"                "is.recursive"         
      [45] "is.single"             "is.symbol"             "is.table"              "is.unsorted"          
      [49] "is.vector"             "isatty"                "isBaseNamespace"       "isdebugged"           
      [53] "isIncomplete"          "isNamespace"           "isOpen"                "isRestart"            
      [57] "isS4"                  "isSeekable"            "isSymmetric"           "isSymmetric.matrix"   
      [61] "isTRUE" 

      Note that the assertive package contains more is functions with a consistent naming scheme.

    • For every checking function , there almost always exist a corresponding class-converting function

      > x <- "123.456"
      > as(x,"numeric")
      [1] 123.456
      > as.numeric(x)
      [1] 123.456
      > (x<-c(2,12,343,34997))
      [1]     2    12   343 34997
      1     2
      2    12
      3   343
      4 34997
      > as(x,"data.frame")
      Error in as(x, "data.frame") : 
        no method or default for coercing numeric?to Data.frame?

      As seen in above example, the function is more robust than the as(x,”xxx”) function. 

    • We can directly change the class of an object by assigning a new class to it, but this is not recommended.

      > (x <- "123.456")
      [1] "123.456"
      > class(x) <- "numeric"
      > x
      [1] 123.456
  • Some functions are available for examining variables, such as summary(), head(), tail(), and str().
    Sometimes the print method of a class might obscure some internal structure of the object. To bypass this, we can use the unclass() function.
    The attributes() function gives you a list of all the attributes belonging to an object. 
  • For visualizing 2-D variables such as matrix and data frame, we can display the object in a (popup) spreadsheet, by one of following ways
    • view(): read only
    • edit(): changible, stored to a new object

    • fix(): changible, stored to original object

Back to top

6 Objects in the workspace: list and remove

Some examples:

  • ls(): list all objects in the workspace
  • ls(pattern=”ea“): list objects whose names contain “ea”

  • ls(all.names=TRUE): list all variables including those hidden ones whose name start with “.”

    > ls()
     [1] "a"             "ab"            "all_true"      "and"           "b"             "gender"       
     [7] "gender_char"   "gender_factor" "none_true"     "some_true"     "x"             "y"            
    > ls(all.names=TRUE)
     [1] ".Random.seed"  "a"             "ab"            "all_true"      "and"           "b"            
     [7] "gender"        "gender_char"   "gender_factor" "none_true"     "some_true"     "x"            
    [13] "y" 
    > ls(pattern="nd")
    [1] "and"           "gender"        "gender_char"   "gender_factor"
  • rm(a,b,gender): remove specific variables from the workspace.

  • rm(list=ls()): remove everything from the workspace. 

Back to top

Coding language