I studied statistics, so R was the first programming I ever learned. I didn't kn...

phillc73 · on April 24, 2020

The pipe operator in R also sort of does my head in.

I learned R in a way that everything went right to left. The variable on the left was manipulated by whatever (functions or other code) on the right.

The pipe operator reverses that flow, with everything now moving left to right, which I find difficult to follow and debug.

jointpdf · on April 24, 2020

>The pipe operator in R also sort of does my head in.

As someone who taught myself base R from scratch in 2013, I (used to!) agree with this. When the pipe operator was first introduced, I’d roll my eyes whenever I saw a script that used it and move along.

But I forced my brain to adapt, and now it’s probably my favorite feature of the R language. Data science is full of sequences of transformations, and in my opinion it’s more readable and bug-resistant to phrase these long chains as:

  f(x) %>% g() %>% h()

rather than:

  h(g(f(x)))

or certainly:

  foo <- f(x) 
  foo <- g(foo)
  foo <- h(foo)

I can comprehend and modify others’ (to include past versions of me) R code much more quickly with this paradigm. You can quickly debug a chain by commenting out functions sequentially (i.e. first test: “f(x) # %>% ...”). It also becomes much faster to plug new transformations into the chain, when needed.

One thing that helps to keep track of the input x as it moves through the chain is using the “.” placeholder (especially when you need to specify function arguments), like so:

  f(x) %>% 
    g(., n=100, param=“baz”) %>%
    mean(.$column_name)

Here, the . stands in for “whatever is coming out of the pipe” from the left.

laGrenouille · on April 24, 2020

The best use of the pipe operator, in my opinion, is when every pipe followed by an end line, like this:

  data %>%
    filter(thing > 4) %>%
    mutate(new = fun(old)) %>%
    group_by(var) %>%
    summarize(new_mean = mean(new))

Then you are reading the code top-to-bottom, just like any other R script but without all of the temporary variables.

disgruntledphd2 · on April 25, 2020

This only really works if nothing ever fails.

It's great for data analysis and terrible for programming.

The reason it sucks for programming is because you can't debug it or inspect intermediate variables, which is really really annoying.

It's great for one off transformations and plotting, but a really, really really bad idea for programming.

Then you add NSE, which makes it hard to functionalise procedural pipes (especially for people who learned tidyverse) and it's a recipe for unmaintainable and profoundly annoying legacy code.

That bring said,I love it for interactive analysis.