No, it's never promised unlimited — it's always had usage limits: 20× the usage of their regular Pro plan, with a limit of 50 sessions per month (a session being a 5-hour window), although I don't know if they ever enforced this.
Location: London, UK
Remote: Yes
Willing to relocate: No
Technologies: TypeScript, React, Next.js, PHP/Laravel, Generative AI, Photoshop/Sketch/After Effects
Website: https://dvy.io
LinkedIn: https://linkedin.com/in/dvyio
Email: david@davidbarker.me
I'm a multidisciplinary designer-developer with deep curiosity and a passion for building intuitive, human-centered products, particularly those leveraging generative AI.
My professional roles have typically involved much more than just coding, spanning product design, strategy, marketing, and customer support. I thrive in small, ambitious teams where I can make a tangible impact.
Outside of work, I've built successful side projects, including:
- Balance, a free web app that anonymously helps people with acute anxiety (https://balance.dvy.io/)
I have a similar domain - https://hackernewsalerts.com - but it's for tracking replies to comments and posts you've made. It's on maintenance mode at the moment, couldn't gather as much interest as I'd hoped. Have open sourced it.
Very expensive, but I've been using it with my ChatGPT Pro subscription and it's remarkably capable. I'll give it 100,000 token codebases and it'll find nuanced bugs I completely overlooked.
(Now I almost feel bad considering the API price vs. the price I pay for the subscription.)
If you're in the habit of breaking down problems to Sonnet-sized pieces you won't see a benefit. The win is that o1pro lets you stop breaking down one level up from what you're used to.
It may also have a larger usable context window, not totally sure about that.
> lets you stop breaking down one level up from what you're used to.
Can you provide an example of what you mean by this? I provide very verbose prompts where I know what needs to be done and just let AI “do” the work. I’m curious how this is different?
Sonnet 3.7 and O1 Pro both have 200K context windows. But O1 Pro has a 100K output window, and Sonnet 3.7 has a 128K output window. Point for Sonnet.
I routinely put about 100K + of context into Sonnet 3.7 in the form of source code, and in the Extended mode, given the right prompt, it will output perhaps 20 large source files before having to make a "continue" request (for example if it's asked to convert a web app from templates to React).
I'm curious whether O1 Pro actually exceeds Sonnet 3.7 in Extended mode for coding or not. Looking forward to seeing some benchmarks.
I am very curious how 3.7 and o1 pro perform in this regard:
> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.
Anyone ever tries to restructure a 10K text? For example, structure a 45min - 1hr interview transcript in an organized way without losing any detailed numbers / facts / supporting evidence. I find that none of OpenAI's model is capable of this task. Models are trying to summarize and omitting details. I think such task does not require much intelligence, but surprisingly OpenAI's "large" context model cannot make it.
There actually were almost no benchmarks for o1 pro before because it wasn't on the API. o1 pro is a different model from o1 (yes, even o1 with high reasoning).
I regularly push 100k+ tokens into it. So most of my code base/large portions. I use the Repo Prompt product to construct the code prompts. It finds bugs and solutions at a rate that is far better than others. I also speak into the prompt to describe my problem, and find spoken language is interpreted very well.
I also frequently download all the source code of libraries I am debugging, and when running into issues, pass that code in along with my own broken code. Its very good
How long is it's thinking time when compared to o1?
The naming would suggest that o1-pro is just o1 with more time to reason. The API pricing makes that less obvious. Are they charging for the thinking tokens? If so, why is it so much more expensive if there are just more thinking tokens anyways?
I think o1 pro runs multiple instances of o1 in parallel and selects the best answer, or something of the sort. And you do actually always pay for thinking models with all providers, OpenAI included. It's especially interesting if you remember the fact that OpenAI hides the CoT from you, so you're in fact getting billed for "thinking" that you can't even read yourself.
I dont have the answers for you, I just know that if they charged 400$ a month I would pay it. It seems like a different model to me. I never use o3-mini or o3-mini-high. Just gpt4o or o1 pro
They won't. Your use cases won't be something the AI can't do itself, so why would they sell it to you instead of replace you with it?
AGI means the value of a human is the same as an LLM, but the energy requirements of a human are higher than those of an LLM, so humans won't be economical any more.
Actually, I think humans require much less energy than LLMs. Even raising a human to adulthood would be cheaper from a calorie perspective than running an AGI algorithm (probably). Its the whole reason why the premise of the Matrix was ridiculous :)
Some quick back of the envelope says that it would take around 35 MWh to get to 40 years old (2000 kcal per day)
I read an article once that claimed an early draft/version that was cut for time or narrative complexity had the human brains being used as raw compute for the machines, with the Matrix being the idle process to keep the minds sane and functional for their ultimate purpose.
I've read a file that claimed to be that script; it made more sense for the machines to use human brains to control fusion reactors than for humans to be directly used as batteries.
(And way more sense than how the power of love was supposed to be a nearly magical power source in #4. Boo. Some of the ideas in that film were interesting, but that bit was exceptionally cliché.)
I'd love to read that file. Of course, we're close (really close?) to being able to just ask an LLM to give us a personalized version of the script to do away with whatever set of flaws bother us the most.
One of the ways I experiment with LLMs is to get them to write short stories.
Two axies: Quality and length.
They're good quality. Not award winning, but significantly better than e.g. even good Reddit fiction.
But they still struggle with length, despite what the specs say about context length. You might manage the script length needed for a kid's cartoon, but not yet a film.
I'll see if I can find another copy of the script; what I saw was long enough ago my computer had a PPC chip in it.
Beige proto-iMac. I had a 5200 as a teen and upgraded to either a 5300 or a 5400 at university for a few years — the latter broke while at university and I upgraded again to an eMac, but I think this was before then.
HA! I used REALbasic a bit back in the day, then spent my time comparing it to LiveCode, back then called Revolution. Geoff Perlman and I once co-presented at WWDC to compare the two tools.
OpenAI doesn’t have the pre-existing business, relationships, domain knowledge, etc to just throw AGI at every possible use case. They will sell AGI for some fraction of what an equivalent human behind a computer screen would cost.
“AGI” is also an under-specified term. It will start (maybe is already there) equivalent to, say, a human in an overseas call center, but over time improve to the equivalent of a Fortune 500 CEO or Nobel prize winner.
“ASI”, on the other hand, will just recreate entire businesses from scratch.
There's could be something to what you wrote. If AGI were to be achieved by a model, why would they give access to it via an API? Why not just sell what it can do? E.g. business services. That would be far more of a moat.
I do something similar, but "raw" markdown instead + filename, so all my prompts end up like this basically:
Do blah blah blah while taking blah and blah into account. Here is my current code:
File `file1.js`:
```javascript
console.log('I am number one!')
```
File `file2.js`:
```javascript
console.log("I am number two :(")
```
Not sure if I'm imagining, but when I tried with/without the markdown code blocks, it seems to do better when I used markdown code blocks, so wrote a quick CLI that takes a directory path + prompt and creates something like that automatically for me. Often times I send identical prompts to ChatGPT+DeepThink+Claude, compare the approaches and continue with the one that works best for that particular problem, so having something reusable really saved time for this.
Edit: fuck it, in case people are curious how my little CLI works, I threw it up here: https://github.com/victorb/prompta (beware of bugs and whatnot, I've quite literally hacked this together without much thought)
Yeah, like a lightweight version of my prompta CLI :)
What I end up with, is one .md file that uses variables like "$SRC", "$TESTS" and "$DOCS" inside of it, that gets replaced when you run `prompta output`, and then there is also a JSON file that defines what those variables actually get replaced with.
Bit off-topic, but curious how your repository ends up having 8023 lines of something for concatenating files, while my own CLI sits on 687 lines (500 of those are Rust) but has a lot more functionality :)
Not OP, but practically all of those lines are from a package-lock.json file (6755 lines) and a changelog (541 lines). It looks like the actual source is 179 lines long.
I tried the web demo (https://repomix.com/) and it seems to generate unnecessarily complex "packs" for no reason, probably hurts LLM performance too. Why is there "Usage Guidelines" and "File Format" explanations in this, when it's supposed to just be the code "packed"? Better to just have the contents+filename, it'll infer that its directory structure and everything else.
While possibly being strange defaults, both of those are options. Remove the file summary and directory structure, both featured on the UI, and on the CLI tool, and voila, it's in your "better" state. There are also additional compression options beyond those two tweaks.
Location: London, UK
Remote: Yes
Willing to relocate: No
Technologies: TypeScript, React, Next.js, PHP/Laravel, Generative AI, Photoshop/Sketch/After Effects
Website: https://dvy.io
LinkedIn: https://linkedin.com/in/dvyio
Email: david@davidbarker.me
I'm a multidisciplinary designer-developer with deep curiosity and a passion for building intuitive, human-centered products, particularly those leveraging generative AI.
My professional roles have typically involved much more than just coding, spanning product design, strategy, marketing, and customer support. I thrive in small, ambitious teams where I can make a tangible impact.
Outside of work, I've built successful side projects, including:
- Balance, a free web app that anonymously helps people with acute anxiety (https://balance.dvy.io/)
Claude 3.5 Sonnet is great, but on a few occasions I've gone round in circles on a bug. I gave it to o1 pro and it fixed it in one shot.
More generally, I tend to give o1 pro as much of my codebase as possible (it can take around 100k tokens) and then ask it for small chunks of work which I then pass to Sonnet inside Cursor.
It disappoints me when otherwise intelligent people take him for his word at this point. Even ignoring his descent into political madness and conspiracy, he's simply not trustworthy.
Fool me once, shame on Elon. Fool me 194 times, shame on me.
They appear to have removed reference to this 50-session cap in their usage documents. (https://gist.github.com/eonist/5ac2fd483cf91a6e6e5ef33cfbd1e...)
So even if these mystery people Anthropic reference who did run it "in the background, 24/7", they still would've had to stay within usage limits.