A single H100 has 80GB of memory, meaning that at FP16 you could roughly fit a 40B parameter model on it, or at FP4 quantisation you could fit a 160B parameter model on it. We don't know (I don't think) what quantisation OpenAI use, or how many parameters o1 is, but most likely...
...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.