Hacker Newsnew | past | comments | ask | show | jobs | submit | ylow's commentslogin

The actual problem that is being solved here is well defined mathematically and is matrix completion via low rank matrix factorization. And using a sampling approach for it. (I have not read the paper in its entirety, just skimmed the intro a bit). It is called "recommendation system" largely due to some history around some of its common appplications (Netflix challenge). But this is not addressing the subjective recommendation problem, but a very particular instantiation of it.


Yes, I get that..., my issue was with its application (a movie recommendation)... the idea itself isn't qualified for the quantum realm...


I don't see how this follows.

Besides, "subjectivity" concerns the subject, "objectivity" the object. The former is a matter of how an object is received by the observer, and so all perception is subjective in that sense; it can't be otherwise. But the subject can become an object of another subject. We can infer with varying certainty what someone is more or less likely to enjoy based on our knowledge of what they like.


As long as you accept subjectivity, you must also accept that logical inference isn't really useful... because the observer can add rules at any given time - without any logical constraint - thus preferences aren't deterministic... so a random option is as good as it can be.


The application, method and algorithm needs to be separated. The application is movie recommendation. One of the methods which works pretty well for this is low rank matrix completion. There are several algorithms for this method, one of which is quantum.


This is using statistics to tell a preconceived story. Underlying this a notion that foreign workers are simply “imported” like they are dug out of the ground or something. How do these STEM OPT people find jobs? Guess what. They interview like everyone else does.

1: Every big tech interview I have been in the visa status is not even a question in the interview process. There is just a simple gate that “can you legally work in the US”? The hiring committee is not even thinking about visa (that’s a HR problem)

2: Are there confounders in that foreign workers are less likely to negotiate? Absolutely.

3: are there confounders in that people who come to US for study are likely already a self selected bunch who are striving to succeed? What are the typical grade distributions between foreign STEM students and US STEM students? Is grade a confounding variable? What happens if we control for GPA?

And finally does H1B abuse happen? Absolutely.

There is a lot of nuance that are not captured by surface level statistics. But nuance does not make outrage.


I think that nuance is fairly unwarranted when you observe the American system through the lens the laws ask you to view it through.

The American H1B system isn't about importing foreign workers that do a good job, it's about importing foreign workers that do a job no American could do. The system demands you look for some American do do the job first, and only if you fail to find one can you import one from overseas.

In that view, all the talk about GPA fall flat, because it doesn't matter if the foreign worker is better than an American worker, you are supposed to pick the American worker anyway.


Bwaha. I have never met an H1B in tech that was doing a job that an American could not do.

The ‘mandatory interviews for Americans’ are just transparent scams.


That's because the system was sold with a lie or has otherwise been co-opted with revisions over the years.

It was supposed to be for hiring wernher von braun type world leading experts in their field.

It was never supposed to be for hiring a bunch of code plumbers, which is what 99.9% of this industry consists of.


Eh - H1B the status was created in the 90’s, and immediately was used for ‘code plumbers’.

You’re probably thinking of the O-1 Visa, though that was also created in the ‘90’s.

[https://www.uscis.gov/working-in-the-united-states/temporary...]


I’ve rarely seen them used for that purpose, but of course that’s my personal observation and I have no data about it. A lot of H1B folks I worked with were solid backend/mobile devs. I enjoyed working with most of them but it’d be a stretch to say they were uniquely qualified. I do notice they feel a lot of pressure to never push back on anything due to how they set up the H1B system. Not always the case but often. I always felt that they were being exploited in that way.


Foreign workers are much more likely to agree to do illegal things, including kickback schemes. Unwind this shit and start over. Indian contracting companies big and small have fucked it all up.


Even just audio transcription can hallucinate in bizarre ways. https://arstechnica.com/ai/2024/10/hospitals-adopt-error-pro...


We are here to help lower that :-) . As we can push dedupe to the edge we can save on bandwidth as well. And hopefully make everyone upload and download faster.


Great question! Rsync also uses a rolling hash/content defined chunking approach to deduplicate and reduce communication. So it will behave very similarly.


One more: do you prefer the CDC technique over using the rowgroups as chunks (ie using knowledge of the file structure)? Is it worth it to build a parquet-specific diff?


I think both are necessary. The cdc technique is file format independent. The row group method makes Parquet robust to it.


I believe Parquet predates Arrow. That's probably why.


Can you elaborate? As I understand Delta Lake provides transactions on top of existing data and effectively stores "diffs" because it knows what the transaction did. But when you have regular snapshots, its much harder to figure out the effective diff and that is where deduplication comes in. (Quite like how git actually stores snapshots of every file version, but very aggressively compressed).


Hi all! Yucheng (CEO XetHub) here, happy to answer any technical questions anyone might have. Our current tech is a significant enhancement over the original Git Is For Data paper we published last year https://www.cidrdb.org/cidr2023/papers/p43-low.pdf . Hope to write more about it soon! (Maybe with follow up paper or at minimum, a blog post)


The dedupe is optimistic and is designed to scale to 1-10 PB range. There is a more complicated architecture blog post we are working on. We can dedupe across repositories but we do not right now largely for privacy reasons so that blocks are not shared across different people as that can cause information leakage.


Indeed this is unsurprising given how LLMs work. I mean if you ask a human to generate a random number, and then reset the universe and all state of the human and ask again, you will get the same number.

But instead if I ask it to generate 100 samples, it actually works pretty well.

"You are a weighted random choice generator. About 80% of the time please say ‘left’ and about 20% of the time say ‘right’. Generate 100 samples of either "left" or "right". Do not say anything else. "

I got 71 left, and 27 right.

And if I ask for 50%, 50%. I get 56 lefts and 44 rights.


> Indeed this is unsurprising given how LLMs work. I mean if you ask a human to generate a random number, and then reset the universe and all state of the human and ask again, you will get the same number.

It actually is surprising, and you should be surprised rather than post hoc justifying it, because the logits should reflect the true random probability and be calibrated in order to minimize the prediction loss. Putting ~100% weights on 'heads' is a terrible prediction!

And the LLM logits are in fact calibrated... before they go through RLHF and RLHF-derived dataset training. (Note that all of the models OP lists are either non-base tuned models like ChatGPT, or trained on data from such models, like Phi.) This was observed qualitatively when the 3.5 models were first released to the Playground, documented by the GPT-4 paper, and the 'flattened logits' phenomenon has been found many times since, not just by OP, and mostly by people totally ignorant of this phenomenon (despite being quite well known).

This is just one of those things, like BPE-related errors, that we're doomed to point out again and again in the Eternal September of LLMs.


> Putting ~100% weights on 'heads' is a terrible prediction!

For a weighted coin, isn't this the optimal strategy in the absence of other information? `p > p^2 + ( 1 − p )^2`.


No, because you're confusing loss functions: a LLM makes a probabilistic prediction, not a hard decision. That is the optimal strategy only if you have something like a 0-1 loss function†, akin to betting on a coin flip, which is not a proper scoring rule (and not easily differentiable either).

Whereas LLMs are usually trained with a proper scoring rule which incentivizes them to report calibrated predictions, like mean squared error. For that, the optimal prediction is just '50%', perhaps transformed into log-odds, and whatever the equivalent of '50%' is over the BPE vocabulary.

† eg if you are betting $1 on whether heads or tails come up, it is true that you can't do better than always betting $1 on the side with P>50% - and strikingly, this is not what people do in setups like the spinner game (or Twitter polls), they 'probability match', which is optimal in terms of Thompson sampling, as if they were playing a indefinitely-long repeated bandit to minimize regret. I usually take this as an example of System I vs System II: showing how hard it is to break our real-world-appropriate intuitive behavior in artificial game setups. If you think about it, in the usual spinner-game, probability matching is just straightforwardly wrong and it's not like a bandit at all; but you do have to think about it.


(Yes 71 + 27 != 100, but that LLMs can't count is a whole other issue)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: