More

pdeffebach · on Feb 14, 2024

I wish there was a better way to run quarto as a script, as in, as fast as `source` in R and `include` in Julia. Current behavior

1. Has scoping rules that make it difficult to debug 2. Has low latency, making it frustrating for debugging.

cscheid · on Feb 14, 2024

Thanks for the feedback.

The scoping rules are by design and match .ipynb workflows in the case of multiple documents, so we're unlikely to change it.

The render latency of quarto is definitely higher than we'd love, but we have a plan and have been steadily improving it. Quarto 1.4 is generally about 20% faster than 1.3, and we have performance regression infrastructure to not let us slip on it.

pdeffebach · on Feb 14, 2024

Thanks for the feedback! Just to re-state my case in clearer terms:

For my personal workflow (others may differ), compiling to html is only done once at the end of a session, and the latency wouldn't matter if it could execute like a script. Weave.jl^[1] has a great feature called `include_weave` which has the features I like.

But take my feedback with a grain of salt. I generally just save things in folders and compile a pdf separately with many tables and figures.

[1] https://weavejl.mpastell.com/stable/usage/#include_weave

cscheid · on Feb 14, 2024

I think this is going to be right up your alley, then:

https://quarto.org/docs/computations/render-scripts.html

pdeffebach · on Feb 14, 2024

Perfect! So glad this exists!

pdeffebach · on Dec 28, 2023

Have you tried DataFramesMeta.jl? It has a tutorial for people familiar with tidyverse. See [here](https://juliadata.org/DataFramesMeta.jl/stable/dplyr/).

pdeffebach · on Dec 28, 2023

Maybe [this](https://m3g.github.io/JuliaNotes.jl/stable/workflow/)? The key is to work with modules, use Revise.jl, and Infiltrator.jl.

pdeffebach · on Nov 29, 2022

Jupyterlab is just an IDE. You can open up terminals, consoles (which are like notebooks, but are a console), and notebooks, on top of a text editor.

It's actually really nice. Works great over ondemand and is way more responsive than X-based apps.

Jupyterhub is a way to coordinate compute space / config across many users of notebooks or jupyterlab.

pdeffebach · on Sept 19, 2022

> The American problem is not one of a lack of houses. This is just a symptom.

We made it illegal to build housing, so we don't have enough housing. We could fix our housing crisis by re-legalizing housing _without_ fixing all of the (very real) problems you describe. You are simply engaging in whataboutism which will _not_ solve our housing crisis.

beej71 · on Sept 20, 2022

I live in one of the most overpriced places in the US and they're building like mad, and just annexed a massive parcel into the city to build residential.

They wanted to put up an apartment block, as well, but NIMBYs have held it up.

I think the issue is more complex than housing regulations (which also certainly play a role).

pdeffebach · on June 16, 2022

I think small businesses owners often treat their workers badly and are overall a reactionary force in American politics.

But sympathy makes a bit of sense when you think about it economically. The businesses that do make it in have more market share and are selected to be those that are politically connected. The regulations serve to keep other businesses out, and we should try to help people that aren't as connected start businesses as well.

bestcoder69 · on June 16, 2022

> The regulations serve to keep other businesses out, and we should try to help people that aren't as connected start businesses as well.

If we're talking about pure regulatory capture, fine. But the only concrete examples I ever hear about ones like "small businesses can't afford to pay for health insurance" w/r/t the ACA.

To put my cards on the table, though, I'm not pro-regulation for regulation's sake. I'm more of a "skip the middlemen and just do it through the State, and no one needs to 'regulate' anyone - I can just regulate them myself w/ my vote (and my dollar, if it's a state-owned-enterprise)" guy. But if my quality of life is 100% controlled by disconnected, self-interested private owners who don't care what I have to say, I need the government to come in and at least give me a freedom or two to leverage against them...please.

pdeffebach · on May 16, 2022

I don't get these complaints about `sum!(a, a)`. Sure it's a bit of a footgun that you can overwrite the array you are working with. This doesn't rise to a "major problem" of composability.

The histogram errors seem annoying though. Hopefully they can get fixed.

wnoise · on May 16, 2022

Sure, it's unsurprising that it produces unexpected results, but there are actually semantics that should be expected. The problem is that implementing those semantics correctly for all cases is hard, because aliasing. Same issue that e.g. memcpy() vs memmove() have.

pdeffebach · on May 16, 2022

What semantics are expected in other languages? This seems solidly in the realm of undefined behavior as far as I can tell.

wnoise · on May 16, 2022

The obvious semantics for these functions is that f!(a, args...) should do the same thing as a .= f(args...).

It's only undefined behavior because the simple implementations don't do that in the presence of aliasing.

I brought up memcpy() and memmove() (which in C are copying identity functions on bytes) exactly for this point. memcpy() has undefined behavior when the source and destination ranges overlap (implementable as a simple loop), while memmove() does the right thing if they do overlap, at the cost of having to check what direction they overlap when they do. And in C you can actually easily check if they overlap and in what direction, because the only interface there is the pointer. Aliasing with objects with internal details that are more complicated than that to check is difficult, perhaps too difficult to expect. But it is possible if your only handling your own objects: witness analogous behavior getting specified in numpy: https://docs.scipy.org/doc/numpy-1.13.0/release.html#ufunc-b... . They do note that this can require allocation, even in some cases where it shouldn't. But not allocating is of course most of the point of the in-place versions.

pdeffebach · on May 17, 2022

Thanks for the detailed response.

Yeah allocation seems like the biggest hangup here. I would rather have a function stick to a "no allocating" contract and allow for some undefined behavior than have a function unexpectedly allocate to preserve safety.

pdeffebach · on Jan 26, 2022

Let's say you have a data frame

    df = tibble(a = c(1, 2))

and you want to use a dplyr verb to modify it

    mutate(df, b = a + 1)

the `a` in the above expression refers to the column in `df`, but this means it's hard to reference a variable in the outer scope named `a`. Furthermore, if you have a string referring to the column name `"a"`, you can't simply write

    mutate(df, b = a_var + 1)

Contrast this with DataFramesMeta.jl, which is a dply-like library for Julia, written with macros.

    df = DataFrame(a = [1, 2])
    @transform df :b = :a .+ 1

Because of the use of Symbols, there is no ambiguity about scopes. To work with a variable referring to column `a` you can write

    a_str = "a"
    @transform df :b = $a_str .+ 1

I won't pretend this isn't more complicated or harder to learn. Some of the complexity is due to Julia's high performance limiting non-standard evaluation in subtle ways. But a core strength of Julia's macros is that it's easy to inspect these expressions and understand exactly what's going on, with `@macroexpand` as shown in the blog post.

DataFramesMeta.jl repo: https://github.com/JuliaData/DataFramesMeta.jl

ryanmonroe · on Jan 26, 2022

To reference variables in the outer scope, you would do

    mutate(df, b = .env$a + 1)

And if you have a string (contained in a_var) which identifies a variable you can do

    mutate(df, b = .data[[a_var]] + 1)

You could argue these feel clumsy, but I wouldn’t say it’s “hard” to do either of these things with dplyr.

krumbie · on Jan 26, 2022

I don't think it's just about whether it's hard to do, your syntax example looks short enough and one can memorize these two patterns relatively quickly.

However, both patterns are another special case how identifiers are resolved in the expression. Aren't `.env` and `.data` both valid variable and column names? So what happens if I have a column named `.data`?

Another example, which is the reason why we chose the `:column` style to refer to columns in `DataFramesMeta.jl` and `DataFrameMacros.jl`:

What happens if you have the expression `mutate(df, b = log(a))`. Both `log` and `a` are symbols, but `log` is not treated as a column. Maybe that's because it's used in a function-like fashion? Maybe because R looks at the value of `log` and `a` in their scope and sees that `log` is a function an `a` isn't?

In Julia DataFrames, it's totally valid to have a column that stores different functions. With the dplyr like syntax rules it would not be possible to express a function call with a function stored in a column, if the pattern really is that function syntax means a symbol is not looked up in the dataframe anymore.

In Julia DataFrameMacros.jl for example, if you had a column named `:func` you could do `@transform(df, :b = :func(:a))` and it would be clear that `:func` resolves to a column.

This particular example might seem like a niche problem, but it's just one of these tradeoffs that you have to make when overloading syntax with a different meaning. I personally like it if there's a small rule set which is then consistently applied. I'd argue that's not always the case with dplyr.

ryanmonroe · on Jan 26, 2022

I hadn't thought of that tradeoff. After testing just now, if you have a column named `.data` or `.env` those constructs work as if there was no such column, and actually in that case `mutate(df, b = .data + 1)` is an error.

Personally I'll happily take not being able to use those as column names if it means I can avoid always typing : before every in-data variable, but your comment gave me a better understanding of why it would be bad for some other person or scenario, perhaps where short term ease-of-use is lower on the list of priorities.

For your second example, it doesn't come up in R because a data frame column cannot be a function. Columns must be vectors (including lists) and you could have a vector where one or all elements are functions, but the column itself cannot not be a function (functions are not vectors), so there's no ambiguity there. To call a function stored in your data frame you'd have to access an element of the column, and any access method, e.g. `[[` or `$` would make the resulting set of characters invalid as the name of an object (without backticks, which would then disambiguate the intent)

    df <- tibble(x = list(function(x) x + 1))
    df %>% 
      mutate(y = x[[1]](3))

Separate from dplyr, in R when you use `(` to call a function it searches only for functions by that name.

    log <- 3
    log(1)
    # 0

    frog <- 3
    frog(3)
    # Error in frog(3) : could not find function "frog"
    
    log <- function(x) x^2
    log(1)
    # 1

pdeffebach · on Jan 27, 2022

In Julia you could have an `AbstractVector` type also be callable, or more likely a vector of callable objects (and the operation is performed row-wise).

I agree it's unlikely that a user will name their column `.data`. But it certainly saves developer effort from thinking about these issues.

The larger concern, really, is that Julia needs to know which things are columns and which things are variables in an expression at parse time in order to generate fast code for a DataFrame. It needs to do this without inspecting the data frame, since the data frame's contents aren't known at parse time.

One option would be to make all literals columns. But then you run into issues with things like `missing`, which would have to be escaped or not recognized as a column. Its hard to predict all the problems there, and any escaping rules would definitely have to be more complicated than R's. So we require `:` and take the easy way out, which has the added benefit for new users who might get confused about the variable-column distinction.

pdeffebach · on Jan 26, 2022

It would be interesting to profile the 2nd version though. Assuming the non-standard evaluation has performance benefits (which they do in DataFramesMeta.jl), are you eliminating those benefits when you use

    .data[[a_var]]

?

Tyr42 · on Jan 26, 2022

It's even better when you have the "." variable which get populated.

But in general yeah, R plays pretty fast and loose with scopes, and lets you capture expressions as arguments and execute them in a different scope from the outside one

pdeffebach · on Dec 17, 2021

DataFramesMeta.jl might be exactly what you are looking for then! The syntax is very close to dplyr, but has performance benefits thanks to Julia.

Here is a tutorial for those familiar with dplyr: https://juliadata.github.io/DataFramesMeta.jl/stable/dplyr/

fault1 · on Dec 17, 2021

DataFramesMeta is great!

But I always get confused by the name. Since DataFrames.jl is lower level shouldn't that be DataFramesBase.jl and the meta package be DataFrames.jl?

pdeffebach · on Dec 17, 2021

Yes it absolutely needs a new name!

Hasnep · on Dec 17, 2021

The convention in Julia is that a package that defines a type Abc is called Abcs.jl. Also, DataFrames.jl provides its own manipulation functions which DataFramesMeta is a wrapper around using metaprogramming, hence the name.

fault1 · on Dec 17, 2021

That makes sense, but I still think the meta name is confusing. I mean, as a user the fact that it was implemented using metaprogramming techniques has no bearing, it's an implementation detail. Actually, my brain never thought to associate meta in this context with metaprogramming. Makes sense in hindsight, but still confusing.

But still, I can't really come up with a nicer name. VerbalDataFrames to match the dplyr verbs idiom?

Hasnep · on Dec 20, 2021

Yeah, I agree it's not a good name. I think using the word macro instead of meta is more useful to the user, something like DataFramesMacros.jl.

davnn · on Dec 17, 2021

One of the piping macro packages + dataframes.jl works as well.

pdeffebach · on July 26, 2021

> 2) I personally found that Julia community is slightly hostile to feedback and negativity. May be it is just me but it has way too much hype-driven-positivity that leads to delusion.

I want to push back on this a bit, which I acknowledge is very ironic. In the past few years people consistently post on Discourse asking for fundamental changes to the language to make it more resemble python, C++, or whatever their preferred language is.

People often say Go is great because there is "only one way of doing things", yet people are very resistant to being told "the way" to do something in Julia. This has happened enough that it's prompted a pinned PSA on discourse: https://discourse.julialang.org/t/psa-julia-is-not-at-that-s...

It gets tiring! And i'm not sure how the community should handle these requests, but I don't think it's fair to blame all of the negativity on the Julia community when these somewhat misinformed, or even bad-faith posts are so frequent.