22 Security

Most Shiny apps are deployed within a company firewall and since you can generally assume that your colleagues aren’t going to try and hack your app64, you don’t need to think about security. If, however, your app contains data that only some of your colleagues should be able to access, or you want to expose your app to the public, you will need to spend some time on security. When securing your app, there are two main things to protect:

  • Your data: you want to make sure an attacker can’t access any sensitive data.

  • Your compute resources: you want to make sure an attacker can’t mine bitcoin or use your server as part of a spam farm.

Fortunately your job is made a little easier because security is a team sport. Whoever deploys your app is responsible for security between apps, ensuring that app A can’t access the code or data in app B, and can’t steal all the memory and compute power on the server. Your responsibility is the security within your app, making sure that an attacker can’t abuse your app to achieve their ends. This chapter will give the basics of securing your Shiny, broken down into securing your data and securing your compute resources.

If you’re interested in learning a little more about security and R in general, I highly recommend Colin Gillespie’s entertaining and educational useR! 2019 talk, “R and Security”.

22.1 Data

The most sensitive data is stuff like personally identifying information (PII), regulated data, credit card data, health data, or anything else that would be a legal nightmare for your company if was made public. Fortunately, most Shiny apps don’t deal with those types of data65, but there is an important type of data you do need to worry about: passwords. You should never include passwords in the source code of your app. Instead either put them in environment variables, or if you have many use the config package. Either way, make sure that they are never included in your source code control by adding the appropriate files to .gitignore. I also recommend documenting how a new developer can get the appropriate credentials.

Alternatively, you may have data that is user-specific. If you need to authenticate users, i.e. identify them through a user name and password, never attempt to roll a solution yourself. There are just too many things that might go wrong. Instead, you’ll need to work with your IT team to design a secure access mechanism. You can see some best practices at https://solutions.rstudio.com/sys-admin/auth/kerberos/ and https://db.rstudio.com/best-practices/deployment/. Note that code within server() is isolated so there’s no way for one user session to see data from another. The only exception is if you use caching — see Section 23.5.5 for details.

Finally, note that Shiny inputs use client-side validation, i.e. the checks for valid input are performed by JavaScript in the browser, not by R. This means it’s possible for a knowledgeable attacker to send values that you don’t expect. For example, take this simple app:

secrets <- list(
  a = "my name",
  b = "my birthday",
  c = "my social security number", 
  d = "my credit card"
)

allowed <- c("a", "b")
ui <- fluidPage(
  selectInput("x", "x", choices = allowed),
  textOutput("secret")
)
server <- function(input, output, session) {
  output$secret <- renderText({
    secrets[[input$x]]
  })
}

You might expect that a user could access my name and birthday, but not my social security number or credit card details. But a knowledgeable attacker can open up a JavaScript console in their browser and run Shiny.setInputValue("x", "c") to see my SSN. So to be safe, you need to check all user inputs from your R code:

server <- function(input, output, session) {
  output$secret <- renderText({
    req(input$x %in% allowed)
    secrets[[input$x]]
  })
}

I deliberately didn’t create a user friendly error message — the only time you’d see it was if you’re trying to break the app, and we don’t need to help out an attacker.

22.2 Compute resources

It’s hopefully obvious that the following app is very dangerous, because it allows the user to run any R code they want. They could delete important files, modify data, or send confidential data back to the user of the app.

ui <- fluidPage(
  textInput("code", "Enter code here"),
  textOutput("results")
)
server <- function(input, output, session) {
  output$results <- renderText({
    eval(parse(text = input$code))
  })
}

In general, the combination of parse() and eval() is a big warning sign for any Shiny app66: they instantly make your app vulnerable. Similarly, you should never source() an uploaded .R file, or rmarkdown::render() an uploaded .Rmd. But these cases are pretty obvious, and are unlikely to be source of real problems.

The bigger challenge arises because are a number of functions that parse(), eval(), or both, in a way that you’re not aware of. Here are the most common:

  • Model formulas. It’s possible to construct a model that executes arbitrary R code:

    df <- data.frame(x = 1:5, y = runif(5))
    mod <- lm(y ~ {print("Hi!"); x}, data = df)
    #> [1] "Hi!"

    This makes it difficult to safely allow a user to define their own models.

  • Glue labels. The glue package provides a powerful way to create strings from data:

    title <- "foo"
    number <- 1
    glue::glue("{title}-{number}")
    #> foo-1

    But glue() evaluates anything inside of {}:

    glue::glue("{title}-{print('Hi'); number}")
    #> [1] "Hi"
    #> foo-1

    If you want to allow a user to supply a glue string to generate a label, instead use glue::glue_safe() which only looks up variable names and doesn’t evaluate code:

    glue::glue_safe("{title}-{number}")
    #> foo-1
    glue::glue_safe("{title}-{print('Hi'); number}")
    #> Error: object 'print('Hi'); number' not found
  • Variable transformation. There’s no way to safely allow a use to provide code snippets to transform a variable for dplyr or ggplot2. You might expect they’ll write log10(x) but they could write {print("Hi"); log10(x)}

    This also means that you should never use the older ggplot2::aes_string() with user supplied input. Instead, stick with the techniques in Chapter 12.

The same problem can occur with SQL. For example, if you construct SQL with paste(), e.g.:

find_student <- function(name) {
  paste0("SELECT * FROM Students WHERE name = ('", name, "');")
}
find_student("Hadley")
#> [1] "SELECT * FROM Students WHERE name = ('Hadley');"

An attacker can provide a malicious username:67

find_student("Robert'); DROP TABLE Students; --")
#> [1] "SELECT * FROM Students WHERE name = ('Robert'); DROP TABLE Students; --');"

This looks a bit odd, but it’s a valid SQL query in three parts:

  • SELECT * FROM Students WHERE name = ('Robert'); finds a student with name Robert.

  • DROP TABLE Students; deletes the Students table (!!).

  • --' is a comment needed to prevent the extra ' from turning into a syntax error.

To avoid this problem, never generate SQL strings with paste and instead use system that automatically escapes user input (like dbplyr), or use glue::glue_sql():

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
find_student <- function(name) {
  glue::glue_sql("SELECT * FROM Students WHERE name = ({name});", .con = con)
}
find_student("Robert'); DROP TABLE Students; --")
#> <SQL> SELECT * FROM Students WHERE name = ('Robert''); DROP TABLE Students; --');

It’s a little hard to tell at first glance, but this is safe, because SQL’s equivalent of \' is '' so the query returns all rows of the Students table where the name is literally “Robert’); DROP TABLE Students; –”.