Introduction to R programming 1: Preliminaries
1 Get helps in R
 To get help on a function or a dataset:
?function_name <==> help(function_name)
For example, ?mean is equivalent to help(mean).  To find functions using a keyword
??keyword <==> help.search(keyword)
Note: when the search term consists of multiple words, it shall be quoted.
For example,??plotting
is equivalent tohelp.search(plotting)
??"nonlinear regression"
is equivalent tohelp.search("nonlinear regression")
 The apropos(keyword) function can be used to find variables and functions containing the keyword and available for your usage, including the ones you defined.
> my_vector apropos("vector") [1] ".__C__vector" "as.data.frame.vector" "as.vector" "as.vector.factor" [5] "is.vector" "my_vector" "vector" "Vectorize"
A fancier usage of apropos() function is to do matching with regular expressions.
> apropos("z$") [1] ".rs.disableQuartz" "SSgompertz" "toeplitz" "unz"
 Use example() function to hearn a function by examples
For example:example(plot)
 Use demo() function to get demonstrations for some concepts. For example
demo()
lists all demonstrations, anddemo(colors)
demonstrates R’s predefined colors() function.  Use the browseVignettes() function to view HTML materials for the packages installed on local machine, or use the vignette() function to view specific HTML material.
browseVignettes()
vignette("Sweave",package="utils")  The RSiteSearch() function can be used to run a query at Official Website of R Project to look for any package:
RSiteSearch("nonlinear regression")
2 Some general concepts
 R language is case sensitive.
 R names consist of alphanumeric symbols, plus '.' and '_'， with a restriction that a name must start with '.' or a letter.
 Two kinds of basic R commands: (1) expressions (evalue, then print, then value is lost); (2) assignment (evaluate, store the value, but not print).
 Multiple commands can coexist in one line, separated by ';'. Nultiple commands can also be grouped together into one compound expression by a pair of braces '{' abd '}'.
 We can run a R file, say example.R, by the command:
> source("example.R")
 By default, R outputs evaluation results to the console. However, the outputs can be redirected to a file, say output.txt, by the command
> sink(“output.txt”)
and such redirection can be stopped to resume normal console output by> sink()
 To get the help of a function, for example, solve(), we can use the commands
> help(solve)
or?solve
 The command
> objects()
returns the names of objects in the workspace under current R session, and the rm() function can be used to remove objects from the workspace> rm(x,y,z, temp,foo)
 When exiting a R session, R prompts to ask whether to save the workspace, meaning that all objects will be saved to a .RData file, and all command lines will be saved to a .Rhistory file.
Later if R is started from same directory, these history data will be loaded into R session. It is recommended that you should use separate working directory for analyses conducted with R.
3 Install packages and extra related software
To install packages
install.packages("package_name") # install directly from internet
install.packages(file.choose()) # pop up a UI to install from local compressed file
There are some software for extending functionalities of R.
 Under Linux, they can be easily installed by package manager.
 Under Windows, these software can be installed via the installr package:
install.packages("installr") installr()
4 R as a scientific calculator
 Under R command line, x does purely assignment, but (x) does both assignment and print out.
 All the arithmetic operators in R are vectorized: an operator or a function will act on each element of a vector without the need for you to explicitly write a loop.
 You can use either ‘^’ or ‘**’ for exponentiation, but using ‘^’ is more common.
 Different divisions
> (1:10)/3 # floating division [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333 2.6666667 3.0000000 3.3333333 > (1:10) %/% 3 # integer division [1] 0 0 1 1 1 2 2 2 3 3 > (1:10) %% 3 # remainder after division [1] 1 2 0 1 2 0 1 2 0 1
 The functions log1p(x) and expm1(x) calculate more accurately for very small value of x than log(1+x) and exp(x1), respectively.
 You can use ‘==’ to check equality between integers, but use this for nonintegers will be problemmatic.
 The all.equal(x,y) function can be used to check if two objects x and y are (nearly) equal, with tolerance level, by default 1.5e8, specified.
 If the objects are different, comparison is still made to some extent, and a report of the differences is returned. To suppress report and enforce TRUE/FALSE output, the isTRUE() function can be used to wrap the all.equal() function.
> x=sqrt(2)^2 > x==2 [1] FALSE > all.equal(x,2) [1] TRUE > all.equal(x,2,tolerance = 1e20) [1] "Mean relative difference: 2.220446e16"
 Do not use all.equal() directly in if expressions—either use isTRUE(all.equal(….)) or identical() if appropriate.
> identical(2,2) [1] TRUE > identical(sqrt(2)^2,2) [1] FALSE

Global assignment x < can alternatively be done by assign(x, 5, globalenv()).

Unlike TRUE and FALSE that are reserved by R, T and F are defined, but not reserved, by R. Therefore, it is highly recommended to use TRUE and FALSE, instead of T and F.
The three vectorized logical operators are ‘!‘, ‘&‘, and ‘‘, representing not, and, and or, respectively.> (x = 5) [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE > (y !x [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > x & y [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE > x  y [1] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
The three possible logical values are TRUE, FALSE, and NA.
The any() and all() functions are useful for dealing with logical vectors.
all(all_true) [1] TRUE > all(some_true) [1] FALSE > all(none_true) [1] FALSE > any(all_true) [1] TRUE > any(some_true) [1] TRUE > any(none_true) [1] FALSE
5 Some notes about object classes
 Type .Machine gives you some information about the properties of R’s numbers.
> .Machine $double.eps [1] 2.220446e16 $double.neg.eps [1] 1.110223e16 $double.xmin [1] 2.225074e308 $double.xmax [1] 1.797693e+308 $double.base [1] 2 $double.digits [1] 53 $double.rounding [1] 5 $double.guard [1] 0 $double.ulp.digits [1] 52 $double.neg.ulp.digits [1] 53 $double.exponent [1] 11 $double.min.exp [1] 1022 $double.max.exp [1] 1024 $integer.max [1] 2147483647 $sizeof.long [1] 4 $sizeof.longlong [1] 8 $sizeof.longdouble [1] 12 $sizeof.pointer [1] 4
Note that R has three classes of numerical values: numeric (for floating), integer, and complex.
> class(1:10) [1] "integer" > class(0.5:4.5) [1] "numeric" > class(sqrt(2)) [1] "numeric" > class(5L) [1] "integer" > class(1+2i) [1] "complex"
 R does not distinguish singlecharacter and multiplecharacter strings.
 Factors are, in essence, integers with labels, since factor values are stored as integers rather than characters, for the purpose of efficient storage in memory.
Sometimes, in order to manipulate contents of factors, we need to use as.character() to convert factors to strings in order to use string manipulation functions.> gender < factor(c("Male","Female","Male","Female")) > gender [1] Male Female Male Female Levels: Female Male > nlevels(gender) [1] 2 > as.integer(gender) [1] 2 1 2 1 > as.character(gender) [1] "Male" "Female" "Male" "Female"
 To check whether an object is of a specific class, we can equivalently use either of following
is(obj_name,”class_name”)
is.class_name(obj_name)
However, the latter way is much more recommended than the former one. We will see that the first way is not as robust as you might imagine. To test a floating number, you shall use is.double() function instead of is.numeric(), since the latter returns TRUE for both integer and floating numbers.
is.numeric(2) [1] TRUE > is.double(2) [1] TRUE > is.integer(2) [1] FALSE > is.numeric(2L) [1] TRUE > is.double(2L) [1] FALSE > is.integer(2L) [1] TRUE
 We can list all is.xxx() functions in the base package as following
> ls(pattern="^is",baseenv()) [1] "is.array" "is.atomic" "is.call" "is.character" [5] "is.complex" "is.data.frame" "is.double" "is.element" [9] "is.environment" "is.expression" "is.factor" "is.finite" [13] "is.function" "is.infinite" "is.integer" "is.language" [17] "is.list" "is.loaded" "is.logical" "is.matrix" [21] "is.na" "is.na.data.frame" "is.na.numeric_version" "is.na.POSIXlt" [25] "is.na<" "is.na<.default" "is.na<.factor" "is.name" [29] "is.nan" "is.null" "is.numeric" "is.numeric.Date" [33] "is.numeric.difftime" "is.numeric.POSIXt" "is.numeric_version" "is.object" [37] "is.ordered" "is.package_version" "is.pairlist" "is.primitive" [41] "is.qr" "is.R" "is.raw" "is.recursive" [45] "is.single" "is.symbol" "is.table" "is.unsorted" [49] "is.vector" "isatty" "isBaseNamespace" "isdebugged" [53] "isIncomplete" "isNamespace" "isOpen" "isRestart" [57] "isS4" "isSeekable" "isSymmetric" "isSymmetric.matrix" [61] "isTRUE"
Note that the assertive package contains more is functions with a consistent naming scheme.

For every checking function is.xxx() , there almost always exist a corresponding classconverting function as.xxx().
> x < "123.456" > as(x,"numeric") [1] 123.456 > as.numeric(x) [1] 123.456 > (x<c(2,12,343,34997)) [1] 2 12 343 34997 > as.data.frame(x) x 1 2 2 12 3 343 4 34997 > as(x,"data.frame") Error in as(x, "data.frame") : no method or default for coercing numeric?to Data.frame?
As seen in above example, the as.xxx() function is more robust than the as(x,”xxx”) function.

We can directly change the class of an object by assigning a new class to it, but this is not recommended.
> (x < "123.456") [1] "123.456" > class(x) < "numeric" > x [1] 123.456
 To test a floating number, you shall use is.double() function instead of is.numeric(), since the latter returns TRUE for both integer and floating numbers.
 Some functions are available for examining variables, such as summary(), head(), tail(), and str().
Sometimes the print method of a class might obscure some internal structure of the object. To bypass this, we can use the unclass() function.
The attributes() function gives you a list of all the attributes belonging to an object.  For visualizing 2D variables such as matrix and data frame, we can display the object in a (popup) spreadsheet, by one of following ways
 view(): read only

edit(): changible, stored to a new object

fix(): changible, stored to original object
6 Objects in the workspace: list and remove
Some examples:
 ls(): list all objects in the workspace

ls(pattern=”ea“): list objects whose names contain “ea”

ls(all.names=TRUE): list all variables including those hidden ones whose name start with “.”
> ls() [1] "a" "ab" "all_true" "and" "b" "gender" [7] "gender_char" "gender_factor" "none_true" "some_true" "x" "y" > ls(all.names=TRUE) [1] ".Random.seed" "a" "ab" "all_true" "and" "b" [7] "gender" "gender_char" "gender_factor" "none_true" "some_true" "x" [13] "y" > ls(pattern="nd") [1] "and" "gender" "gender_char" "gender_factor"

rm(a,b,gender): remove specific variables from the workspace.

rm(list=ls()): remove everything from the workspace.
 Log in to post comments