The trick is to provide dense rewards, i.e. not only once full goal is reached, but a little bit for every random flailing of the agent in the approximately correct direction.
Article talks about all of this and references DeepSeek R1 paper[0], section 4.2 (first bullet point on PRM) on why this is much trickier to do than it appears.
The correct solutions and the viable paths probably are known to the trainers, just not to the trainee. Training only on problems where the solution is unknown but verifiable sounds like the ultimate hard mode, and pretty hard to justify unless you have a model that's already saturated the space of problems with known solutions.
(Actually, "pretty hard to justify" might be understating it. How can we confidently extract any signal from a failure to solve a problem if we don't even know if the problem is solvable?)
Your hard mode is exactly the situation that RL is used, because it requires neither a corpus of correct examples, nor insight into the structure of a good policy.
> How can we confidently extract any signal from a failure to solve a problem if we don't even know if the problem is solvable?)
You rule out all the stuff that doesn’t work.
Yes this is difficult and usually very costly. Credit assignment is a deep problem. But if you didn’t find yourself in a hard mode situation, you wouldn’t be using RL.
I, too, enjoy the craftsmanship, but at the end of the day what matters is that the software works as required, how you arrive at that point doesn't matter.
For me, it is not a matter of craftsmanship so much as a repeatable approach for growing the minds of junior engineers such that they have the best chance to succeed.
Variable declaration `T v;` means "declare `v` such that expression `v` has type `T`". Variable declaration `T *p` means declare `p` such that the expression `*p` has type `T`". etc.
My point was indeed, that if you don't use pointer arithmetic in C, that means that you don't use arrays. I mean when you declare arrays of a fixed size, you can also declare an equivalent number of primitive variables instead, but I would find that inconvenient. Hence the question.
If I remember correctly, he meant that only array accesses are used, because their length can be checked (as all arrays have a static length due to no dynamic memory).
Indeed, this is what many people do. But even if you use dynamic memory, if you replace pointer arithmetic by array indexing, you get bounds checking. And in C this also works for arrays of run-time length.
if "array" has a bound whatever expression evaluates to can be checked against the bound of array. If "array" is not a bounded array but a pointer or an unbounded array, then this does not work, but my point is that it is easy to avoid such code.
- what if some site has weird password requirements and the derived password doesn’t work
- what if a site gets hacked and you need to rotate one password.
If you have to store data per-site anyway because of those cases, may as well just store passwords. You can (and should) still generate extremely high entropy passwords.
Additionally, you can store other data for example one could have scans of important documents that are stored in Pass which means they are GPG encrypted and backed by a git repository so they are versioned and shared across multiple machines.
- if your secret leaks and you don't know it (or you do know, but you need some time to change it), the attacker not only gets the snapshot of your password manager but also can derive all future passwords you'll generate, or past ones you long forgot about
- there's no way to know what you've entered before, since it's stateless. With data stored in a manager, I know what username I used and can associate other data. If your uniqueifying input is the domain, and let's say HN would become hn.yc or whatever and you visit it again in ten years, you'd have to remember that hn.yc accepts the password of what you entered as news.ycombinator.com
I have to admit though, hash(name+secret)=password is so simple and beautiful that it draws IT people like a fine artwork draws visitors. But for me, that doesn't outweigh the practical issues
Not all sites are safe, either by design or by people running them. Having a common secret+service name as password AND having at least one of those sites leaking your plaintext password could mean that your derivation may go public and all your other passwords and services fall because of that.
presumably the derivation would involve a cryptographically secure, non-reversible function so as to not compromise the secret should one of them be leaked.
If the moves were destructive, I'd design it to have the default constructor call `::socket` and destructor call `::close`. And there wouldn't be any kind of "closed" state. Why would I want it?
In this case, I would want the address family and protocol to be statically known, so it would have default constructor. But for example, a file might not have one, sure. As for closing before lifetime ends, why? I can just end lifetime. Wrap it in an optional if the type system can't figure it out like with a struct member.
And what's the underlying value of such a default constructed socket? I assume it would be -1 resp. INVALID_SOCKET, in which case the destructor would have to deal with it.
> Wrap it in an optional if the type system can't figure it out like with a struct member.
So you essentially must wrap it in an optional if you want to use it as a member variable. I find this rather pointless as sockets already have a well-defined value for empty state (-1 resp. INVALID_SOCKET). By wrapping it in a optional you are just wasting up to 8 bytes.
Sure, you can implement a socket class like that, but it's neither necessary nor idiomatic C++.
> And what's the underlying value of such a default constructed socket? I assume it would be -1 resp. INVALID_SOCKET
No, as explained, the default value would be the result of `::socket` call, i.e. a fresh OS-level socket.
> So you essentially must wrap it in an optional if you want to use it as a member variable.
No, you only must wrap it if you really want this closed state to exist.
> Sure, you can implement a socket class like that, but it's neither necessary nor idiomatic C++.
Obviously. Because the moves are not destructive. If they were, this design would be superior. And the wasted space for optional is solvable, just like for non-nullable pointers.
I see how destructive moves would slightly simplify the implementation, but what difference would it make apart from that? (Don't get me wrong, I totally think that destructive moves are a good idea in general, I just don't see the qualitative difference in this particular case.)
> And the wasted space for optional is solvable, just like for non-nullable pointers.
In the case of non-nullable pointers the library author knows that they can use NULL as a sentinel value and write a corresponding specialization. But what could you possibly do with an arbitrary user-defined class?
> The same difference as making pointers always non-nullable and reintroducing nullability via an optional wrapper only when semantically appropriate.
Again, I don't see what this has to do with destructive moves. If you want a socket class that always refer to an open socket, you can already do that. Same for non-nullable pointer wrappers. Conversely, destructive moves don't prevent you from implementing a socket class with a close() method. These concepts are really orthogonal.
> Just add some customization points to std::optional so that users can define which value of the class to treat as noneopt internally.
How is this supposed to work? The very point of your socket class is that it always contains a valid socket handle. Once you introduce a sentinel value, you are back to square one. If the optional class is able to construct a socket with the sentinel value, so is the user.
> Again, I don't see what this has to do with destructive moves. If you want a socket class that always refer to an open socket, you can already do that.
Technically you can, but it's unreasonable to create an os-level socket just to put into the moved-out object where it will be immediately destroyed again. This is not an issue when the moves are destructive.
> How is this supposed to work? The very point of your socket class is that it always contains a valid socket handle. Once you introduce a sentinel value, you are back to square one. If the optional class is able to construct a socket with the sentinel value, so is the user.
That's not true. The sentinel value need not be exposed in the public interface of the class, it can only be accessible via the customization point of the optional.
> Technically you can, but it's unreasonable to create an os-level socket just to put into the moved-out object where it will be immediately destroyed again. This is not an issue when the moves are destructive.
No, the class can use a sentinel value internally only to mark moved-from objects. That's exactly where we actually started the conversation. That's why I said that destructive moves would only somewhat simplify the move operations, but not make a qualitative difference (in this area).
> The sentinel value need not be exposed in the public interface of the class, it can only be accessible via the customization point of the optional.
Since the optional would need to construct an instance with the sentinel value, I thought that the "sentinel" constructor must be public. However, you might be right that one could write a template specialization that contains the template argument as a friend class. In this case you could use a private constructor. Note that the destructor still has to handle the sentinel value... But I guess this is just something you have to accept.
> No, the class can use a sentinel value internally only to mark moved-from objects. That's exactly where we actually started the conversation.
The issue is that the "moved-from" state is exposed to the user when the moves are not destructive. The author of the class has to consider behavior for every method in sentinel state, even when it's just to assert that the state isn't sentinel or "lol it's UB". And the user has to be careful not to accidentally misuse an object in sentinel state. Just like how every time you touch a nullable pointer you have to consider if it can be null and what to do in that case. As long as the sentinel state is exposed at all (via non-destructive move), there is little gain in not providing full support for it. However, with destructive moves the sentinel value either doesn't exist at all or only exists completely internally as an optimization, and all this mental overhead disappears.
1. This is only relevant when using such class as a local variable. Member variables are typically not moved-from.
2. In my understanding the user has the freedom to specify what constitutes a "valid but unspecified state" and it would be perfectly ok to mandate that anything you can do with a moved-from object is to either destroy or reassign it.
3. The problems with the state of moved-from objects from the perspective of a library author could have been prevented simply by imposing stricter requirements in the standard (e.g. every usage except destruction, and possible reassignment, shall be UB).
4. With all the issues you've pointed out, it is still be perfectly possible and reasonable to design a socket class your way (= no closed socket state) in C++, yet somehow most people seem to prefer open() and close() methods instead of modelling the state with an optional. Even in the presence of destructive moves, I don't think that one way is necessarily better than the other and it is mostly a matter of culture and personal preference.
All the being said, I definitely agree that destructive moves are good thing, in particular if the compiler prevents you accidentally accessing moved-from objects (which is a mistake that is very easy to make in C++).
Indeed, the "valid but unspecified state" refers only to some types defined in the he standard library. It essentially means that you can only call methods which have no preconditions and don't depend on what that state is, e.g. assignment or destruction, or something like string::clear or vstring::assign if you want defined outcomes. In general each type is free to guarantee whatever the author wants about the moved from state, e.g. moved-from std::unique_ptr is always null.
So I essentially have to wrap it in something like std::optional. Well, that's certainly one way to write a socket class, but I'd say it's not idiomatic C++. (I have never seen a socket class being implemented like that.)
reply