Tamaybes's comments

Tamaybes · 2025-10-01T18:17:29 1759342649

Mechanize Inc. | San Francisco, CA (Hybrid, ONSITE preferred) | Senior SWE ($500k+equity), Junior SWE ($250k+equity)

Apply at: https://jobs.ashbyhq.com/mechanize

Mechanize builds sophisticated reinforcement learning environments to simulate realistic software engineering tasks (feature development, debugging, refactoring, reliability testing) for frontier AI labs. Our mission is to automate software engineering first, then all economically valuable work. We're growing quickly, working with leading AI labs, and backed by investors like Nat Friedman, Daniel Gross, Patrick Collison, and Jeff Dean. Featured in NYT and TechCrunch.

Tamaybes · on Dec 17, 2023

CMOS processors can become around 200x-fold more energy efficient than the H100.

Tamaybes · on Nov 11, 2023

TLDR: the H100, lower precision, and other advances lead to a big jump in computational performance. We're in for a wild ride when the next generation of models is trained on 100x more compute in 2024 and 2025.

Tamaybes · on Feb 15, 2022

The result about recent compute trends is different from the recent trends described by OpenAI. In particular, they find a 3.5-month doubling time over the Deep Learning Era, whereas the paper finds a 6-month doubling time.

I think the Large-Scale Era does point to a new phenomenon that emerged pretty discontinuously, which is that there are now 'two lanes' in ML scaling. Prior to 2015, academic and industry would train roughly similarly compute intensive models. Since then, a small number of industry players frequently train models with 10-100x more compute than what the typical researcher uses.

joe_the_user · on Feb 15, 2022

The thing is that the advent of deep learning was a very big change in the sense you had a general purpose method appear that you could use to throw computing power at many/most problems (and tune a bit but still) and get results that previously you couldn't get (and when did get results, you required domain experts). No doubt we have changes within the trajectory of this escalating brute force solutions. But relative changes in this paradigm seem fundamentally different than the initial advent of the paradigm.

Tamaybes · on Feb 15, 2022

Not parameters, the amount of FLOPS required to train the model.

microtonal · on Feb 15, 2022

Whoops, thanks! That's what I get for typing too quickly between household duties.