> First one is annoying, but it makes sense, to some degree. You say the default is this particular instance that gets created when you define the function.
Could just as easily be an expression that's reevaluated on each call. In fact it's what most other languages do, making it twice as bad a decision. Even Javascript gets it right:
> Second is just how things work, the function is a closure that evaluates the non local variable when you run it.
That's again a decision made by Python. Some other languages behave as if each iteration had declared a different variable. Here's Javascript again, you can even use `const` for the iterating variable:
const fns = [];
for (const i of [1, 2, 3]) {
fns.push(() => console.log(i));
};
fns.forEach(fn => fn());
// 1
// 2
// 3
> Third one... Iterating over the generator does NOT yield an empty list, it raises StopIterator.
Sorry, my phrasing was imprecise. It's not doing 'yield []', but an exhausted generator does behave like an empty generator. Maybe they could have made it so the first exhaustion raises StopIterator, and subsequent ones a different exception? Like reading from a closed channel.
> I don't get the 4th. Comparison operators have the same precedence, so yeah.
It's not precedence, it's Comparison Chaining, the same feature that enables '1 < a <= 10'. If the operators had simple precedence, it would be equivalent to '(1 < a) <= 10' or '1 < (a <= 10)', but Comparison Chaining evaluates it as '(1 < a) and (a <= 10)'. Useful for specifying ranges, but a foot gun in other scenarios, like in my example.
> That's again a decision made by Python. Some other languages behave as if each iteration had declared a different variable.
Doing that requires block scoping. Python does not have block scoping. The alternative would be to create a dedicated pseudo-scoping for for loops, which is technically possible (e.g. `except` blocks have special casing to avoid circular references) but is a lot of additional complexity, not “just” changing the scoping of iteration variables.
Furthermore (1) happens to provide a pretty good workaround: you can set the iteration variable as a default value of the closure, it’ll be evaluated at the creation of the closure thus “fixing” it.
>That's again a decision made by Python. Some other languages behave as if each iteration had declared a different variable. Here's Javascript again, you can even use `const` for the iterating variable:
But then can you use i after the loop? In Python you can use i after the loop, which can be useful (e.g. if you break out)
for i in range(10):
if condition(i):
break
else:
print("didn't break")
print("broke at", i)
It is also easy to "fix":
ll = [lambda i=i: print(i) for i in range(5)]
This issue is always going to be an issue for Python because it doesn't have explicit variable declaration. In C++ you have the difference between
for (int i = 0;; i++)
// and
int i;
for (i = 0;; i++)
while in Python there's no way to write 'declare i here so it will be scoped outside the loop', which if it existed would make the behaviour you desire a reasonable alternative. Python variables are always scoped to the surrounding function.
for i in range(5):
if i % 2 == 0:
q = i
print(q) #=> 0 0 2 2 4
>Could just as easily be an expression that's reevaluated on each call.
Not ideal but it is consistent with other parts of the language. This is just one of those things you need to learn and in practice isn't really an issue. The fact it's different from other languages is not a relevant concern at all. 'Being like JS' is not a virtue.
>Sorry, my phrasing was imprecise. It's not doing 'yield []', but an exhausted generator does behave like an empty generator. Maybe they could have made it so the first exhaustion raises StopIterator, and subsequent ones a different exception? Like reading from a closed channel.
Iteration is just calling __next__. It's useful to iterate over things more than once sometimes.
IMO, this:
y = iter(x)
for z in y:
do(z)
if condition(z):
break
for z in y:
do2(z)
is bit nicer than this:
flag = False
for z in y:
if flag:
do(z)
if condition(z):
flag = True
else:
do2(z)
>Comparison Chaining
Incredibly useful feature. I've never seen anyone write 'a < b == c' so this seems a bit artificial. I'd hardly describe it as a footgun.
So can somebody explain to me, step by step with the "elements behind the curtain", what is actually going on in these two cases:
>>> [ x() for x in [lambda: i for i in range(5)] ]
[4, 4, 4, 4, 4]
Why?
>>> [ x() for x in [lambda i=i: i for i in range(5)] ]
[0, 1, 2, 3, 4]
And why?
Starting from the fact that the following give:
>>> [lambda: i for i in range(5)]
[<function <listcomp>.<lambda> at 0xC22>,
<function <listcomp>.<lambda> at 0xCAE>,
<function <listcomp>.<lambda> at 0xC9A>,
<function <listcomp>.<lambda> at 0xCD6>,
<function <listcomp>.<lambda> at 0xCC2>]
>>> [lambda i=i: i for i in range(5)]
[<function <listcomp>.<lambda> at 0xCF4>,
<function <listcomp>.<lambda> at 0xCFE>,
<function <listcomp>.<lambda> at 0xD08>,
<function <listcomp>.<lambda> at 0xD12>,
<function <listcomp>.<lambda> at 0xCCC>]
This is your first snippet (expanded to two lines and max index reduced from 5 to 3):
lambda_list = [lambda: i for i in range(3)]
results = [x() for x in lambda_list]
It expands to this:
i = 0
f0 = lambda: i
i = 1
f1 = lambda: i
i = 2
f2 = lambda: i
results = [f0(), f1(), f2()]
The body of those functions f0, f1, f2, each looks in the function's own dictionary of local variables for "i". But it doesn't exist there, so it looks in the enclosing scope (and keeps doing this recursively up to global scope if necessary). At the next scope it does find a variable "i". This happens at the point you call the function (the same as anything else inside the definition of a function), in this case on that last line when you assign to results. By that point, the "i" variable in the enclosing scope has value 2. So results = [2, 2, 2].
[edit: this feature of functions in Python (variables are looked up by name when the function is run) is what lets you define functions foo() and bar() in a module / script that call each other. When the first one is defined, the other one doesn't exist yet, but you're not actually running the code inside the function so it doesn't matter. By the time you come to call it later, the other function does exist in the enclosing (module level) scope so the lookup succeeds.]
This is your second snippet (I changed one of the variable names from i to j for clarity, but you can rename it back and it would work the same way):
lambda_list = [lambda j=i: j for i in range(3)]
results = [x() for x in lambda_list]
it expands like this:
i = 0
f0 = lambda j=i: j
i = 1
f1 = lambda j=i: j
i = 2
f2 = lambda j=i: j
results = [f0(), f1(), f2()]
But defaults of function arguments are evaluated at the point the function is declared, not when it's called (as noted elsewhere in this thread - hence the def foo(x=[]) gotcha!). You can imagine the second line of code above is like this, just to really drive the point home [edit: I originally used a made up syntax, but I updated to code that actually works]:
I've tried to see __defaults__ and __kwdefaults__ here:
i = 7
lambda_list = [lambda: i for i in range(3)]
results = [x() for x in lambda_list]
print( i )
for k in range( 3 ):
print( lambda_list[k] )
print( lambda_list[k].__defaults__ )
print( lambda_list[k].__kwdefaults__ )
lambda_list = [lambda j=i: j for i in range(3)]
results = [x() for x in lambda_list]
print( i )
for k in range( 3 ):
print( lambda_list[k] )
print( lambda_list[k].__defaults__ )
print( lambda_list[k].__kwdefaults__ )
print( i )
The output is:
7
<function <listcomp>.<lambda> at 0xB2>
None
None
<function <listcomp>.<lambda> at 0xBC>
None
None
<function <listcomp>.<lambda> at 0xC6>
None
None
7
<function <listcomp>.<lambda> at 0xD0>
(0,)
None
<function <listcomp>.<lambda> at 0xDA>
(1,)
None
<function <listcomp>.<lambda> at 0xE4>
(2,)
None
7
If I understand it, __kwdefaults__ are always None and the __defaults__ used only in the second case. I've also tried to see the "i" as not being inside of the closure, but it remains 7, not like (as noted by 'remram here):
i = 9
y = 1
for y in range( 1,-1,-1 ):
try:
x = 4 / y
except Exception as i:
pass
print( i )
Which gives, for me, fascinating:
9
Traceback (most recent call last):
File "p.py", line 10, in <module>
print( i )
^
NameError: name 'i' is not defined. Did you mean: 'id'?
[Edit: quietbritishjim was faster than me, and makes more or less the same points. You can safely ignore this post.]
First let's have a look at this:
>>> [ x() for x in [lambda: i for i in range(5)] ]
[4, 4, 4, 4, 4]
Variable i in the function body is a nonlocal or global, taken from the nearest enclosing scope where i is bound. When a function (lambdas are functions just like any other) is executed, a nonlocal or global variable gets the value it has in its scope at the time of the function call. Step by step:
>>> import inspect
>>> i = 0
>>> a = lambda: i
>>> inspect.getclosurevars(a)
ClosureVars(nonlocals={}, globals={'i': 0}, builtins={}, unbound=set())
We see that variable i in a refers to the global i, which currently has value 0. Obviously a() returns 0:
>>> a()
0
Next we create a new lambda, with the same definition as before, but we first increment i.
>>> i = 1
>>> b = lambda: i
>>> inspect.getclosurevars(b)
ClosureVars(nonlocals={}, globals={'i': 1}, builtins={}, unbound=set())
So i in b refers to the global i (as before), which now has value 1. Unsurprisingly b() returns 1:
>>> b()
1
But variable i in a also refers to the global i which is now equal to 1:
We can set i to any value, and both a() and b() will then return that value.
>>> i = 'gotcha'
>>> a()
'gotcha'
>>> b()
'gotcha'
Going back to your example, the i used in the lambdas refers to a nonlocal instead of a global (the i used in the list comprehension), but that doesn't make a difference for the topic at hand. What does matter is that the current value of i is used, which is 4 after the for loop in the list comprehension.
Now to the second example.
>>> i = 0
>>> a = lambda i=i: i
i is now a local, since it's a parameter to the function. It's a bit unclear because i is used in two different meanings here; let's rename one of the uses to j. This is exactly equivalent:
This time there are no closure variables. j in the function body does not anymore refer to anything outside of the function scope. Instead a is now a function with one parameter, which returns the value of that parameter:
>>> a(10)
10
>>> a('hello')
'hello'
What's of course more important here is that we have a default value for that parameter. Default parameter values in Python are evaluated when the function is defined, not when it is executed. i was 0 when we defined a, so a will always have 0 as default parameter. We can see that default value:
>>> a.__defaults__
(0,)
Add a second function:
>>> i = 1
>>> b = lambda i=i: i
>>> b.__defaults__
(1,)
Now b is also a function taking one parameter, but this time that parameter has default value 1. Meanwhile the default value of function a has not changed:
>>> a.__defaults__
(0,)
So now we can predict that a() will return its default parameter value (which is 0), and b() will return its default parameter (which is 1). Let's check.
>>> a()
0
>>> b()
1
So the big difference is that your first example uses nonlocal/global variables which are evaluated when the function is called, while your second example uses default parameter values which are evaluated when the function is defined.
Note: you can change default parameter values though, if you really really want:
>>> a = lambda i=5: i
>>> a()
5
>>> a.__defaults__ = ("don't actually do this",)
>>> a()
"don't actually do this"
I like your explanation, it has more missing information and also demonstrates the use of "import inspect", it's definitely not to be ignored!
Did I understand correctly, the fact that the variable is not belonging to the closure is the non-intuitive part, not knowing "how it was decided to do in Python" I'd expect the variable to belong to the closure, but instead just the reference to the variable outside of the closure belongs to the closure? Which AFAIK doesn't save any memory (the reference still has to be there), it was simply a design decision?
So in Python the closure just doesn't "enclose" the variables at all? So any use of any variable in it can destroy something used "outside"? How do those using these construct cope with that in practice? Just ignore? Invent some naming schemes? Something else?
Reminds me on an old Fortran which allowed changing the global constants passed to the functions via an assignment to the function parameter inside of the function. :)
I guess it's a design decision, I guess it could have been designed differently, but I'm not really sure. Doing it differently could very well break other things that people do rely on.
Referring to the original variables instead of making copies does save memory though: in your example there were 5 functions that each would have their own copy, instead of only 1 instance shared between them.
I suspect it isn't so simple to fix the first example in the way you describe, because Python's a 30+ year old language with functions as first-class citizens. If we went back in time and built it from scratch to remove this wart, I'd vastly prefer we require all default kwargs to be immutable types, rather than allocate a new empty list with every function call. But doing that today would be an unacceptable breaking change.
The second one isn't a real issue IMO. It exists as an example of how closures and scoping work in Python, and as a warning to beginners to avoid doing un-Pythonic things with them. Python closures are late-binding. FWIW your JS example behavior works as you expect in Python, if you used a generator instead of a list comprehension, but why not just pass the variable as a parameter to your anonymous function instead? This is a silly, contrived example. Language designers can't fix stupid.
> subsequent ones a different exception? Like reading from a closed channel.
Why no `close()`? Let me rephrase: why can't you use `with` on a generator? That's in their Design FAQ. TLDR: because it's an uncommon use case for generators, and if you really want to, there's ways to do it.
Changing the behavior of `for`, `in`, etc to call close() when the iterator is consumed, might seem simple, but do a little reading on coroutines and `yield from`, and you'll probably be convinced otherwise. Note that it's exceedingly rare that a Python programmer would want to use those two niche language features, but they're at the core of how certain standard library modules are implemented.
As for the last one, this is one nitpick I'm fully on board with. I always make a habit to break up boolean expressions, to avoid having to keep track of every language's boolean evaluation idiosyncracies. If I rewrote Python, I'd axe that chained boolean nonsense altogether because it's the kind of thing beginners will shoot themselves in the foot with over and over again, but it's another breaking change that would never be accepted.
Basically, don't use language features you understand, do a little defensive programming, and these gotchas aren't gonna bite you. JS, on the other hand, has numerous warts that are entirely unavoidable and far more frustrating, but even a broken clock's right twice a day.
Could just as easily be an expression that's reevaluated on each call. In fact it's what most other languages do, making it twice as bad a decision. Even Javascript gets it right:
> Second is just how things work, the function is a closure that evaluates the non local variable when you run it.That's again a decision made by Python. Some other languages behave as if each iteration had declared a different variable. Here's Javascript again, you can even use `const` for the iterating variable:
> Third one... Iterating over the generator does NOT yield an empty list, it raises StopIterator.Sorry, my phrasing was imprecise. It's not doing 'yield []', but an exhausted generator does behave like an empty generator. Maybe they could have made it so the first exhaustion raises StopIterator, and subsequent ones a different exception? Like reading from a closed channel.
> I don't get the 4th. Comparison operators have the same precedence, so yeah.
It's not precedence, it's Comparison Chaining, the same feature that enables '1 < a <= 10'. If the operators had simple precedence, it would be equivalent to '(1 < a) <= 10' or '1 < (a <= 10)', but Comparison Chaining evaluates it as '(1 < a) and (a <= 10)'. Useful for specifying ranges, but a foot gun in other scenarios, like in my example.