Primitive R objects and S3

Objects are just lists and class is just an attribute.

Primordial R objects

The first thing to know is that nearly every object in R is really just a list with named elements. For example, the results returned from "summary" are returned in an object:

x <- summary(c(1:10))
x
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    3.25    5.50    5.50    7.75   10.00

class(x)
## [1] "summaryDefault" "table"

Note: this is a common idiom used in R - wherever multiple results must be returned from a function, create a list or array with each result in a named element, essentially a very lightweight class specialised for that functions results. Also note that the exact form of ``summary()`` called and what is returned depends on the type of arguments passed. I'll talk more about this later.

But if you examine the summary results object, it's actually just a list with named elements:

typeof(x)
## [1] "double"
length(x)
## [1] 6
names(x)
## [1] "Min."    "1st Qu." "Median"  "Mean"    "3rd Qu." "Max."
x[1]
## Min.
##    1

So the summary results type is a list of doubles but it is of the classes summaryDefault and table. This probably seems very odd to those coming from other programming langauges. R has it's own peculiar terminology for typing:

  • type describes what sort of primitive type the object is, e.g. a list, integer, double, a dataframe, and can be calculated with typeof(). Note that as there no such thing as a scalar - a standalone, single numeric quantity - in R, vectors will return as the type of their contents.

  • mode also describes what sort of primitive type or data the object is, albeit in a slightly more generic way, and can be calculated with mode()

  • class describes the interface of the object and what sort of functions it can be used with and how. This ties into R's use of generic functions and can be calculated with class():

    typeof(x)
    ## [1] "double"
    mode(x)
    ## [1] "numeric"
    class(x)
    ## [1] "summaryDefault" "table"
    

Note: There is also, confusingly, ``storage.mode()`` and ``oldClass`` for asking similar but not identical questions. Broadly speaking, ``typeof`` and ``class`` are the only ones you will usually need. ``str()`` provides most of the above information and can be handy for dissecting objects to see how they work.

Attributes & class

Asides from named elements, R objects can also be equipped with attributes, data or members that sit outside the list contents. Attributes can be set and retrieved as follows:

# create and set attribute
attr(x, "foo") <- 23
# retrieve / get attribute value
attr(x, "foo")
## [1] 23

Note: Thus attribute access works like indexing in (say) Python or C++ - the element itself is returned and can be operated directly upon as value or for assignment. This is another common R idiom: a function returning a member that can be got or set.

Attributes cannot be accessed using the slot or member syntax of other R OOP schemes:

x@class
## Error: trying to get slot "class" from an object (class "summaryDefault")
## that is not an S4 object
x$class
## Error: $ operator is invalid for atomic vectors

All attributes of an object can be listed:

# list all attributes
attributes(x)
## $names
## [1] "Min."    "1st Qu." "Median"  "Mean"    "3rd Qu." "Max."
##
## $class
## [1] "summaryDefault" "table"
##
## $foo
## [1] 23

This highlights something important: all well-formed R objects should have a class attribute, which is just another attribute with a string value giving the classes that the object belongs to. The class() function is really just a shortcut for attr (x, "class") and can be sued to retrieve or set an objects class:

class(x)
## [1] "summaryDefault" "table"
class(x) <- "foo"
class(x)
## [1] "foo"
class(x) <- c("foo", "bar")

So, looking at our laundry list of OOP characteristics, named elements and attributes allow R objects to (arguably) allowing modelling and modularity. But at this point an R S3 class looks little more than a dumb container of data, just an annotated list. Is making a class just a matter of assigning a name to the class attribute? If so, what is that good for? To which the answers are "yes" and "generic functions".

Rolling your own objects & classes

This most primitive object system is called "S3". There are many ways that you could put a S3 class together and few if any formal mechanisms for doing so. structure() is useful as it adds ones or more attributes to an object in one pass:

y <- structure(c(1:3), class = "foo")
typeof(y)
## [1] "integer"
class(y)
## [1] "foo"

But this skates around the S3's lack of explicit constructor or intialiser idioms. However, it would be trivial to hack together constructor-like functions to make objects of a particular or specific class:

mk_object <- function(x, kls) {
    structure(x, class = kls)
}

y <- mk_object(c(1:3), "bar")
y
## [1] 1 2 3
## attr(,"class")
## [1] "bar"
mk_foo_object <- function(x) {
    structure(x, class = "foo")
}
y <- mk_foo_object(c(1:3))
y
## [1] 1 2 3
## attr(,"class")
## [1] "foo"

which could elaborated upon to provide type-checking, data transformation and other features.

Generic functions

Note: "method" tends to be used interchangeably with "function" in this context within R. But a method in other programming languages usually refers to a function attached to an object, so I've avoided this.

Where the class attributes are used is in dispatching inside generic functions. Rather than use methods on an object, S3 uses functions outside an object that sniff the objects class type and send it to another function specialised for that object. In summary:

  1. There exists object x of class foo
  2. A (generic) function bar is called on this object: bar (x)
  3. Internally, all arguments passed to the generic function are dispatched to a version of the function specialised for the class of the first argument passed: bar.foo (x)

It's very easy to write a generic function in R. Simply call UseMethod with the function name and the object:

bar <- function(x) {
    UseMethod("bar")
}

and provide specialised versions for the different classes:

bar.foo <- function(x) {
    # whatever
}

If an object has multiple classes, R works its way along the class vector looking for appropriate functions. If no specialised function is supplied (e.g. bar.foo isn't defined), R goes looking for the default implementation, named as if it has a class default, e.g. bar.default:

bar.default <- function(x) {
    # handle all objects
}

This can be used to provided specialised version of pre-existing and builtin functions. For example, print and summary are both generic functions. Specialised versions could be written for bar called print.bar and summary.bar.

You can find out all the specialisations of a function with the call methods():

# lets just see the first ten ...  invisible specialisations are
# asterisked in the listing
methods(print)[1:10]
##  [1] "print.acf"     "print.AES"     "print.anova"   "print.aov"
##  [5] "print.aovlist" "print.ar"      "print.Arima"   "print.arima0"
##  [9] "print.AsIs"    "print.aspell"

Unlike (say) overridden functions in C++, R's generic functions can only dispatch based on the class of the first argument.

Strictly speaking, S3 doesn't handle inheritance. However, there is a function NextMethod which can be used to hand off execution to the next entry in the class vector:

class(z) <- c("baz", "quux")
## Error: object 'z' not found
print.baz <- function(x) {
    # i.e. looks for print.quux
    NextMethod()
}

So theoretically this could be used to create subtype polymorphism, dispatching a function call up a class inheritance chain to be fulfilled by a derived class or to fall back on basal behaviour. In practice, it's not clear to me that this is very useful. It relies on your class vector being arranged from specific to general (i.e. explicitly spelling out the chain of inheritance). And if this is the case, it just explicitly defines behaviour that would happen anyway from R's default behaviour.

Note: This illustrates an important point. S3 objects are not instances of a class, rather class is a property of the object. In fact, most things that we would normally think of as class properties are actually object properties in S3. And so individual objects can be freely hacked and modified.

Summary

  • S3 is the most primitive - and most widespread - form of OOP in R
  • S3 objects are essentially just a list of named elements with some added attributes, one of which is class
  • S3 does not have methods, but instead relies upon external generic functions that dispatch to more specialised functions depending on the class of the passed object