Hacker Newsnew | past | comments | ask | show | jobs | submit | conditionnumber's commentslogin

Still happens all the time in certain finance tasks (eg trying to predict stock prices), but I'm not sure how long that will hold. As for why that might be, I don't think I can do any better than linking to this comment about a comment about your question: <https://news.ycombinator.com/item?id=45306256>.

I suspect that locating the referenced comment would require a semantic search system that incorporates "fancy models with complex decision boundaries". A human applying simple heuristics could use that system to find the comment.

In the "Dictionary of Heuristic" chapter, Polya's "How to Solve it" says this: *The feeling that harmonious simple order cannot be deceitful guides the discover in both in mathematical and in other sciences, and is expressed by the Latin saying simplex sigillum veri (simplicity is the seal of truth).*


Very cool! It looks like you've purchased the data from a vendor, is that right? I love graphs of all kinds, so I was hoping to fetch the raw data and have a look at it with pyvis/networkx.


Have mentioned the source at the top right header. Its by Crustdata.


Sorry, what I was clumsily trying to say is something like: "As an outsider to the startup world, your project has made me curious about the shape of the startup ecosystem as a whole. I bet you have the data to generate that picture! It is unfortunate that our market system is not yet sophisticated enough to compensate people for the difficult/valuable work of data collection, while also maximizing availability of data for those curious to use it."

At first I did not explore the tree visualizations in your web app: I simply noticed an index of company trees. Using the tree visualizer (which shows founders' names and pictures), I immediately realized that these trees represent complex human stories involving thousands of years of individual people's hard work. Interpreted that way, the data deserve a degree of awed respect that I did not show in my original comment. Truly sorry for that.

If you're curious about the shape of the startup ecosystem the way I am, there are a few things you could try. (In what follows I'm assuming "full graph" means company-company links with timestamps, not stories about individual people). pyvis has a feature that allows you to build a static html file with an embedded interactive representation of a graph. The data is embedded in the file, so you might not be able to share that unless you dropped enough information to conform such sharing to your data license. IIRC the static file has limited query/filter functionality so it can be difficult to make large graphs manageable for visualization. If that happens you can try using a graph database with a query UI. I remember another HN submission last year that (IIRC) used neo4j as a backend and provided a web UI with this kind of query/visualize workflow. I believe they also shared Github repos with the front-end/back-end code.

Anyway, thank you for sharing your project and sorry for the shit comment.

https://news.ycombinator.com/item?id=45066060


He scrapped it


For some reason the article made me think about this quote from one of the 2025 MacArthur Fellowship videos, "I think there are some mathematicians who are kind of like the hiker who choose this massive peak they want to scale and they do everything they can to make it up the mountain. I'm more like the kind of hiker who wanders through the forest and stops to look at a pretty stone or flower and reflect on whether it's similar to a stone or flower that I've seen before."


Very cool appendix describing how they collected the data. I was kind of surprised to learn that they collected arXiv abstracts + metadata from Kaggle, but it definitely makes sense. I was also surprised that 6 years of SSRN papers was only ~1.3m documents. If you assume 20 pages/document and 400 words/page and 1.3 tokens/word, then it would only cost (ballpark) $1000 to pass the full corpus through the 4o-mini completions API. I think it would be really neat to build out a "Dataset Used", "Model Used" etc table for SSRN papers. I imagine more complicated questions would be harder to answer (because you might have to analyze non-text parts of the documents).


arXiv creator[0] is one of the co-authors, which gives me better assurance on data collection aspect of the paper.

[0] https://en.wikipedia.org/wiki/Paul_Ginsparg


I've been in a similar slump for a while now (lectures + paper skims >> books + coding), so this is advice I'm telling myself right now. Put a stack of good books in a place where you see them several times a day. There's a good chance their presence will taunt you into reading them. Maybe charge your phone on the stack. Don't feel guilty about skipping around between books. Do feel guilty about neglecting them. I'm going to null route HN and YouTube for the remainder of November. Thanks for the question.


I've seen a very broad spectrum of research code. In general research code translates O(1e1-1e2) lines of mathematics into O(1e3-1e4) lines of code. I find mathematics easier to understand than code, so that's going to color my opinion.

My favorite research code tends to look like the mathematics it implements. And that's really hard to do well. You need to pick abstractions that are both efficient to compute and easy to modify as the underlying model changes. My favorite research code also does the reader a lot of favors (eg documents the shape of the data as it flows through the code, uses notation consistent with the writeup or standard conventions in the field).

Industry research code... I'm happy to see basic things. Version control (not a bunch of Jupyter notebooks). Code re-use (not copy+paste the same thing 20x). Separation of config and code (don't litter dozens of constants throughout thousands of lines of code). Functions < 1000 lines apiece. Meaningful variable names. Comments that link the theory to the code when the code has to be complicated.

Overall it's probably most helpful to find a researcher in your field whose code you like to read, and copy the best aspects of that style. And ask readers of your code for feedback. I really enjoy reading Karpathy's code (not my field), but that may be an exception because a lot of what I've read is intended to teach a more or less codified approach, rather than act as a testbed for iteration in a more fluid design space.


> Actually it's highly usual... CUSIPs... there's nothing to stop you from setting up your own, alternative... numbering system

I don't think there's anything natural about the mandatory use of copyrighted CUSIP identifiers in regulatory reporting. When SEC publishes its quarterly list of 13F securities it includes a disclaimer that it does so "with permission" from the copyright holder. My city doesn't pay royalties or seek approvals when it records and processes car license plate numbers for parking enforcement. The copyright holder seems actively involved in rulemaking that has the potential to diminish the role CUSIPs play in mandatory regulatory reporting.

https://www.federalreserve.gov/apps/proposals/comments/FR-00...


Project is super cool.

If you're adding more LLM integration, a cool feature might be sending the results of allow_many="left" off to an LLM completions API that supports structured outputs. Eg imagine N_left=1e5 and N_right=1e5 but they are different datasets. You could use jellyjoin to identify the top ~5 candidates in right for each left, reducing candidate matches from 1e10 to 5e5. Then you ship the 5e5 off to an LLM for final scoring/matching.


I wonder how much ΔT you need at the crust to meaningfully change Earth's magnetic field by altering convection patterns in the outer core. I don't know enough physics to attempt an answer.


The outer core is 2,890 KM (~ 1800 miles) below the earths crust, and has the mantle in the way. The crust itself is only 30KM thick. [https://phys.org/news/2017-02-journey-center-earth.html] The crust is basically a thin layer of slag on top of a giant ball of molten everything.

Even at million+ year timescales, I can’t see any way the temperature of the upper crust could matter to the core at all - even if the crust was at absolute zero.

Dirt insulates relatively well, and the amount of thermal mass present is mindboggling.


if you lived in the Earth’s core (~6000k) the surface (~300k) would be a rounding error above absolute zero anyway


> would be a rounding error above absolute zero anyway

Kind of joking: unless there are nonlinear effects near 300K? Fig 4 [1] seems to suggest that the thermal diffusivity of the mantle grows very fast as temperature declines past 300K... but the data stop at 200K.

Reason for initial comment: we could probably set up a spherical heat equation to guess how crust cooling would change heat conduction at the outer core. But I have absolutely no idea how to reason about changes in heat conduction affecting the convection dynamics that generate the field. I was silently hoping for one of the domain experts lurking this forum to see it and share wisdom. (But overall it was a silly question, I know).

[1] https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/200...


Calculating or simulating how earths magnetic field behaves or is generated is quite a complex task. So im doubtful we can usefully estimate it to such precision. It would be interesting though.


We know that if the convection in the outer core stops, the Earth's magnetic field stops, and removing enough heat from the core will stop the convection.


Yes but calculating the energy draw required for any measurable change in this effect is very different from knowing the rough process it operates on.

We know how weather works quite well, but knowing if it will rain in a week is an entirely different beast.


I've seen a confident estimate in the form of a calculation. They know what kind of compounds (term?) are in the outer core and they know the minimum temperature those compounds need to be at to be free-flowing enough to sustain the field. And I'm pretty sure we know the current temperature of the outer core.

My memory is that the calculation found that if humanity switched to geothermal for all its energy needs, then in only about 1000 years, the core cools enough for the magnetic field to stop, but I am not sure.

(We should definitely deploy geothermal in the Yellowstone caldera though long enough to cool it down enough so that it will not erupt again.)


That is definitely not true hahaha. The outer core is several thousand km down, and the crust is only 30km thick. And we have the entire mantle below us.

Humanity could max out geothermal for a million years and never make a dent.


Whoa, this is a bit scary. As mentioned earlier, it should basically be used in a way where other energy sources are tapped first, and only the shortfall is covered.


Talk to people in your department about where students who enter industry end up working after graduation. Your university may have a kind of "jobs fair" in Autumn where companies come to recruit. Look into those companies and find out what skills they seem to like.

For what it's worth: I ended up going the quant/finance route (as a "regular guy" with no meaningful accomplishments). If I could start over I would try to do something involving data analysis and biology. I think RNA sequencing is on an exp(-a*t) cost curve, and it feels like this is a domain where data analysis could produce something of greater value than slightly more efficient asset prices.


Thanks for the reply. Are you still in that quant/finance industry? any pros/cons?

I'm a bit afraid, like a bunch of people I guess, of AI eating all those data analyis/science jobs. But that field sounds definitely interesting.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: