When the first PC with Basic launched in the 80s many people wanted to develop f...

bitL · on Dec 27, 2020

RL needs a supercomputer and its code is usually too fragile - making a trivial mistake anywhere (missing a constant multiplication, swapping the order of two consecutive lines of code etc.) would likely lead to your model never converging even if you got everything else right.

chasely · on Dec 27, 2020

The hard part of RL for the problems I've encountered in my work is that you need a simulator. Building a reliable and accurate simulator is often an immense undertaking.

dgb23 · on Dec 27, 2020

Maybe data scientists should team up (more?) with game programmers. They have a ton of experience in building very complex simulations.

Ma8ee · on Dec 27, 2020

Which code is not fragile in that sense? I think that is a rather strange criticism.

Iv · on Dec 27, 2020

You can do RL on an raspberry pi. Depends what problem you are trying to solve but not all of them require video analysis and billions of parameters.

cbames89 · on Dec 27, 2020

Technical point: Value functions that are a constant multiples of each other result in the same behavior.

bitL · on Dec 28, 2020

Making a constant multiplication mistake somewhere in the code doesn't imply the new value function would be a constant multiply of the optimal one.

bonoboTP · on Dec 27, 2020

RL isn't new though, the foundational results are about 25 years old.

WanderPanda · on Dec 27, 2020

And it feels a bit like it is stalling (at least in continuous control)

cbames89 · on Dec 27, 2020

In my opinion there's a wide open array of approaches from control that can help with this. Learning for Control is a new conference that looks at this very topic.

stevofolife · on Dec 27, 2020

No one said "new". You can apply what you said to PC and iPhones. Mainframes and palms existed before them.

dmarchand90 · on Dec 27, 2020

That's still very analogous to the first PCs. By that point there had been decades of foundational computer work