match.arg
A weird function that can help you write better functions
Lately I’ve been working with a lot of people whose first language is not R,
which has given me more of an appreciation for R’s oddities. Some in retrospect
were probably ill-advised, like partial matching with $
:
mtcars$disp
#> [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
#> [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
#> [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
mtcars$di
#> [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
#> [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
#> [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
mtcars$d # Bonus points if you knew what this would do.
#> NULL
Some are comparatively weird, but function fine, like that R doesn’t care if keyword arguments come before positional ones; it just extracts keywords first and then matches positions:
mean(x = c(1, NA, 3), 0, TRUE)
#> [1] 2
mean(na.rm = TRUE, c(1, NA, 3))
#> [1] 2
But some are weird, but actually really convenient and great, which brings us
to the subject of this post: match.arg
.
match.arg
is a function that only works in functions, and thus is not itself
a function people tend to encounter until they try to write code for other
people. (You could use it in functions you write for yourself, but it matters
less, for reasons that will become apparent in a moment.) It’s also a function
that’s baked into much of R’s DNA, so to speak.
For instance, if you look at the documentation for ?optim
, base R’s
general-purpose optimization function, its “Usage” section looks like this:
optim(par, fn, gr = NULL, ...,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent"),
lower = -Inf, upper = Inf,
control = list(), hessian = FALSE)
Something there is a little funny: Why is the default of method
a vector of
six methods? Is it going to run six different ways? The argument documentation
seems to suggest otherwise:
method
The method to be used. See ‘Details’. Can be abbreviated.
“Details” confirms:
The default method is an implementation of that of Nelder and Mead (1965), that uses only function values and is robust but relatively slow. It will work reasonably well for non-differentiable functions.
It then describes "BFGS"
, "CG"
, etc. as alternative methods, which makes
sense—the documentation is telling us all the possible alternatives. But
there’s still something weird here: Is the default really a vector of length 6?
Or is that just a documentation nicety, and the signature of the function
actually contains method = "Nelder-Mead"
?
A quick investigation reveals that it’s not just a nicety—the actual default is a vector of length 6
args(optim)
#> function (par, fn, gr = NULL, ..., method = c("Nelder-Mead",
#> "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf,
#> upper = Inf, control = list(), hessian = FALSE)
#> NULL
…but somehow that turns into a default of "Nelder-Mead"
. How? The culprit
is a few lines into the body of the function:
head(optim, 8)
#>
#> 1 function (par, fn, gr = NULL, ..., method = c("Nelder-Mead",
#> 2 "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf,
#> 3 upper = Inf, control = list(), hessian = FALSE)
#> 4 {
#> 5 fn1 <- function(par) fn(par, ...)
#> 6 gr1 <- if (!is.null(gr))
#> 7 function(par) gr(par, ...)
#> 8 method <- match.arg(method)
What does
method <- match.arg(method)
do?
Starting at ?match.arg
, it tells us it does “argument verification using
partial matching”. More particularly,
match.arg
matchesarg
against a table of candidate values as specified bychoices
, whereNULL
means to take the first one.1
The “Usage” section confirms that optim
definitely isn’t running six
methods by default:
match.arg(arg, choices, several.ok = FALSE)
Ok, this seems to make some sense—match.arg
checks whether an argument to a
function is within a list of possibilities. Let’s try it out:
check_is_us_flag_color <- function(color){
match.arg(color, c("red", "white", "blue"))
}
check_is_us_flag_color("blue")
#> [1] "blue"
tryCatch(
check_is_us_flag_color("orange"),
error = identity
)
#> <simpleError in match.arg(color, c("red", "white", "blue")): 'arg' should be one of "red", "white", "blue">
Ooh, look, we even get informative error messages! How nice.
It’s supposed to do partial matching too. Let’s try:
check_is_fourth_of_july_activity <- function(activity){
match.arg(activity, choices = c(
"watch a parade",
"barbeque in the park",
"watch fireworks"
))
}
check_is_fourth_of_july_activity("watch fireworks") # Still works.
#> [1] "watch fireworks"
check_is_fourth_of_july_activity("barbeque") # Works too!
#> [1] "barbeque in the park"
tryCatch(
check_is_fourth_of_july_activity("watch"), # Doesn't work. That's good.
error = identity
)
#> <simpleError in match.arg(activity, choices = c("watch a parade", "barbeque in the park", "watch fireworks")): 'arg' should be one of "watch a parade", "barbeque in the park", "watch fireworks">
This could be useful! It encourages us to use descriptive options like
"Nelder-Mead"
, but if people don’t want to type that all the time, they can
just type the unambiguous "Nelder"
, and that will work fine. Also, such a
sophisticated approach still takes very little effort from the person writing
the function, which is awfully nice.
If you want to understand the details of precisely how
, the documentation saysmatch.arg
handles
partial matches
Clicking through and trying outMatching is done using
pmatch
, soarg
may be abbreviated.
pmatch
shows it’s pretty simple—it’s not
fuzzy matching à la agrep
, just matching from the beginning of strings.
Ok, now we’ve got a grasp of how match.arg
works, but that still doesn’t
explain how that line from optim
works:
method <- match.arg(method)
There are no choices specified! ?match.arg
gives us a hint:
In the one-argument form
match.arg(arg)
, the choices are obtained from a default setting for the formal argumentarg
of the function from whichmatch.arg
was called. (Since default argument matching will setarg
tochoices
, this is allowed as an exception to the ‘length one unlessseveral.ok
isTRUE
’ rule, and returns the first element.)
There’s a lot going on here. This directly answers one question: What happens
if method
is not set? In this case, it will return the first element, which
is why the default is in fact "Nelder-Mead"
.
Even if you never use match.arg
, this behavior is a good thing to
understand, because you’ll see documentation like this everywhere:
?read.table
, ?png
, ?order
, ?t.test
, ?ggplot2::position_dodge
,
?tidyr::fill
, ?data.table::shift
, etc., etc., etc.
But there are two possibilities here:
- no argument is passed, and
match.arg
gets the full length-6 vector, or - an argument is passed, and
match.arg
is only passed one string.
Taking the first element of a vector explains how the first possibility works:
check_good_firework <- function(firework = c('bottle rocket', 'roman candle', 'ones that require a license')){
match.arg(firework)
}
check_good_firework()
#> [1] "bottle rocket"
match.arg
is passed the full vector for firework
, and picks the first one.
But is it using that vector for choices? Let’s see:
check_good_firework(c('bottle rocket', 'roman candle', 'ones that require a license'))
#> [1] "bottle rocket"
tryCatch(
check_good_firework(c('snake', 'M80')),
error = identity
)
#> <simpleError in match.arg(firework): 'arg' must be of length 1>
Hmm, so it is getting the choices from somewhere. That’s good, as it explains
how the case in which an argument is passed is handled. But where is
match.arg
getting the options if they aren’t passed to it? The docs, one more
time:
In the one-argument form
match.arg(arg)
, the choices are obtained from a default setting for the formal argumentarg
of the function from whichmatch.arg
was called.
Ah, “the formal argument arg
of the function from which match.arg
was
called”. But what’s a “formal argument”? The details get a little
hairy2, but essentially formals are the stuff you put in the parentheses
after function
. There’s even a sensibly-named function to get them, should
you need:
str(formals(optim))
#> Dotted pair list of 9
#> $ par : symbol
#> $ fn : symbol
#> $ gr : NULL
#> $ ... : symbol
#> $ method : language c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent")
#> $ lower : language -Inf
#> $ upper : num Inf
#> $ control: language list()
#> $ hessian: logi FALSE
Notice this is not quite a normal list—some elements are empty (like par
and
fn
), and ...
is actually the name of an element (that doesn’t exist). It is
a list, but this is the topsy-turvy world of operating on the language, so this
is a pairlist, which only much ever gets used for messing with the structure
of functions.3
Regardless, we can see that method
is there with its default. The source of
match.arg
shows how it gets the default to use for choices
:
head(match.arg, 7)
#>
#> 1 function (arg, choices, several.ok = FALSE)
#> 2 {
#> 3 if (missing(choices)) {
#> 4 formal.args <- formals(sys.function(sysP <- sys.parent()))
#> 5 choices <- eval(formal.args[[as.character(substitute(arg))]],
#> 6 envir = sys.frame(sysP))
#> 7 }
This code is both complicated and hairy (please don’t use <-
in function
calls), but we can see that sys.function
gets the calling function, and
formals
extracts the formal arguments. Lines 5-6 extract the argument from
those formals.
Ignoring the environment-handling stuff, let’s try it out:
show_function <- function(x, y = c("foo", "bar")){
sys.function()
}
show_function()
#> function(x, y = c("foo", "bar")){
#> sys.function()
#> }
fmls <- formals(show_function())
str(fmls)
#> Dotted pair list of 2
#> $ x: symbol
#> $ y: language c("foo", "bar")
fmls$y # This is still a language object...
#> c("foo", "bar")
eval(fmls$y) # ...so evaluate it to make it a real object
#> [1] "foo" "bar"
Cool! This code explains the remaining mystery of match.arg
: When only passed
an argument, it grabs the default argument for that parameter and uses it as
choices
. That also explains why match.arg
only works in a function if
choices
is not specified:
tryCatch(
match.arg(c("bratwurst", "hot dog", "hamburger")),
error = identity
)
#> <simpleError in formal.args[[as.character(substitute(arg))]]: no such index at level 1
#> >
So when should you use match.arg
? When you’re writing a function with a
parameter that can take one of a fairly small number of string values. In
return,
- your function will error informatively if passed an incorrect value (instead of breaking wherever it gets used),
- users get convenient partial-matching,
- if you put the choices as the default, the possible choices will be very
clear (to people who know about
match.arg
), and - your friends coming from Python will reinforce their belief that R is strange, but will be unable to reproduce the behavior in Python without significant code.
So is R quirky? Yep. But that’s not all bad.
This
NULL
bit is confusing, even if you come back after reading everything. It refers to a call likematch.arg(NULL, c("deviled eggs", "potato salad"))
, but the function rarely gets used that way. The “take the first one” bit matters, though, because this is the behavior ifchoices
isn’t passed, as is explained later.↩︎Technically only closures have formals, not primitive functions. Go read about the difference if you like, but it rarely matters from a user perspective.↩︎
Try playing around with
alist
and you’ll start to see what’s going on.↩︎
Share this post