Hacker Newsnew | past | comments | ask | show | jobs | submit | mikae1's commentslogin

Hope this on day will be used for auto-tagging all video assets with time codes. The dream of being able to search for running horse and find a clip containing a running horse at 4m42s in one of thousands of clips.

this is a solved problem already — check out https://getjumper.io where you can do exactly this (search through 100s of hours) offline and locally.

Disclaimer: co-founder


It’s not difficult to hack this together with CLIP. I did this with about a tenth of my movie collection last week with a GTX 1080 - though it lacks temporal understanding so you have to do the scene analysis yourself

I'm guessing you're not storing the CLIP for every single frame, instead of every second or so? Also, are you using the cosine similarity? How are you finding the nearest vector?

I split per scene using pyscenedetect and sampled from each. Distance is via cosine similarity- I fed it into qdrant

Would you be willing to share more details of what you did?

Sure. I had a lot of help from Claude Opus 4.5, but it was roughly:

- Using pyscenedetect to split each video on a per scene level

- Using the decord library https://github.com/dmlc/decord to pull frames from each scene at a particular sample rate (specific rate I don't have handy right now, but it was 1-2 per scene)

- Aggregating frames in batches of around 256 frames to be normalized for CLIP embedding on GPU (had to re-write the normalization process for this because the default library does it on CPU)

- Uploading the frames along with metadata (timestamp, etc) into a vector DB, in my case Qdrant running locally along with a screenclip of the frame itself for debugging.

I'm bottlenecked by GPU compute so I also started experimenting with using Modal for the embedding work too, but then vacation ended :) Might pick it up again in a few weeks. I'd like to be able to have a temporal-aware and potentially enriched search so that I can say "Seek to the scene in Oppenheimer where Rami Malek testifies" and be able to get a timestamped clip from the movie.


Gemini already does this (and has for awhile): https://ai.google.dev/gemini-api/docs/video-understanding

you can do that with Morphik already :)

We use an embedding model that processes videos and allows you to perform RAG on them.


Would it allow me to query my library for every movie that contains dance routing move1-move2-move3 in that order?

Rag as in the content is used to generate an answer or rag as in searching for a video?

Are you mistaking William Faulkner's mustache for Hitler's?

Or perhaps a 512GB Mac Studio. 671B Q4 of R1 runs on it.

I wouldn’t say runs. More of a gentle stroll.

I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well.

> I run it all the time, token generation is pretty good.

I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?


I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.

6 tok/sec might be acceptable for a dense model that doesn't do thinking, but for something like DeepSeek 3.2 that does do reasoning, 6 tok/sec isn't acceptable for anything else but async/batched stuff, sadly. Even for a response with just 100 tokens we're talking a minute for it to just write the response, for anything except the smallest of prompts you'll easily be hitting 1000 tokens (600 seconds!).

Maybe my 6000 Pro spoiled me, but for actual usage, 6 or even 9 tok/sec is too slow for a reasoning/thinking model. To be honest, kind of expected on CPU though. I guess it's cool that it can run on Apple hardware, but it isn't exactly a pleasant experience at least today.


Dunno, DeepSeek on MacStudio doesn't feel much slower than when using it directly on deepseek.com; 6t/s is still around 24 characters per second which is faster than many people could read. I also have 6000 Pro but you won't fit any large model there and to be able to run DeepSeek R1/3.1/3.2 671B at Q4 you'd need 5-6 of them depending on the communication overhead. MacStudio is the simplest solution to run it locally.

> 6t/s is still around 24 characters per second which is faster than many people could read.

But again, not if you're using thinking/reasoning, which if you want to use this specific model properly, you are. Then you have a huge delay before the actual response comes through.

> MacStudio is the simplest solution to run it locally.

Obviously, that's Apple's core value proposition after all :) One does not acquire a state-of-the-art GPU and then expect simple stuff, especially when it's a fairly uncommon and new one. You cannot really be afraid of diving into CUDA code and similar fun rabbit holes. Simply two very different audiences for the two alternatives, and the Apple way is the simpler one, no doubt about it.


6t/s will have you pulling your hair out with any deepseek model.

So, quarter stroll.

With quantization, converting it to an MOE model... it can be a fast walk

It'd give you Plasma, for one. Even if I'm a proponent of uBlue I'd consider Mint if there was a Plasma version. Seems like a great distro.

I don’t really care any desktop environment. I use i3. Otherwise I don’t really have much preference between how Firefox, terminal, and Steam are displayed.

I'm an https://getaurora.dev user and I agree uBlue is awesome. I'd like to create a custom image too, but it doesn't seem quite as easy as you say: https://youtube.com/watch?v=IxBl11Zmq5w

I learned about Aurora from a HN comment some weeks ago, and it has been so awesome. I really haven't been as impressed with a distro since the first ubuntu. Its just a rock solid base, awesome defaults, and kde being delightful.

While the video is long, the actual process of setting everything up only took me about 20 minutes. The template they offer is extremely convenient.

I will offer a second positive but more reserved data point. It took me closer to a day to get my custom Bazzite build working.

Switching over to my images using bootc failed because of what I eventually tracked down to permissions issues that I didn't see mentioned in any of the docs. In short, the packages you publish to Github's container registry must be public.

Another wrinkle: The Bazzite container-build process comes pretty close to the limits of the default Github runners you can run for free. If you add anything semi-large to your custom image, it may fail to build. For example, adding MS's VSCode was enough to break my image builds because of resource limits.

Fortunately, both of these issues can be fixed by improving the docs.


There's also BlueBuild [1] which abstracts the process of building your own images away further into yaml configurations.

It takes away a tad bit of the direct control of the process, but covers the majority of things you would want to configure.

[1] https://blue-build.org/


The actual process for the image is really just what I said. In the video he sets up a github actions automatic build, and adds signing with cosign (which are also all steps you really want to do) but to have custom stuff in your base os is really as easy as a Dockerfile (or should I say Containerfile ?)

The final piece of the JPEG XL puzzle!


It's a huge piece for sure, but not the only one. For example, Firefox and Windows both don't support it out of the box currently. Firefox requires nightly or an extension, and on Windows you need to download support from the Microsoft store.


> on Windows you need to download support from the Microsoft store.

To be really fair, on Windows:

- H.264 is the only guaranteed (modern-ish) video codec (HEVC, VP9, AV1 is not built-in unless the device manufacturer bothered to do it)

- JPEG, GIF, and PNG are the only guaranteed (widely-used) image codecs (HEIF, AVIF, and JXL is also not built-in)

- MP3 and AAC are the only guaranteed (modern-ish) audio codecs (Opus is another module)

... and all of them are widely used when Windows 7 was released (before the modern codecs) so probably modules are now the modern Windows Method™ for codecs.

Note on pre-8 HEVC support: the codec (when not on VLC or other software bundling their own codecs) is often on that CyberLink Bluray player, not a built-in one.


Would PDF 2.0 (which also depends JPEG XL and Brotli) put pressure on Firefox and Windows to add more easy to use support?


Brotli? Is it still relevant now that we have Zstandard?

Zstandard is much faster in just about every benchmark, sometimes Brotli has a small edge when it comes to compression ratio, but if you go for compression ratio over speed, LZMA2 beats them both.

Both Zstandard (zstd) and LZMA2 (xz) are widely supported, I think better supported than Brotli outside of HTTP.


Brotli decompresses 3-5x faster than LZMA2 and is within 0.6 % of the compression density, and much better for short documents.

ZStandard decompresses ~2x faster than Brotli but is 5 % less dense in compression density, and even less dense for short documents or documents where the static dictionary can be used.

Brotli is not slow to decompress -- generally a little faster then deflate through zlib.

Last time I measured, Brotli had ~2x smaller binary size than zstd (dec+enc).


Straight from the horse's mouth!

The thing is that Brotli is clearly optimized for the web (it even has a built-in dictionary), and ZStandard is more generic, being used for tar archives and the likes, I wonder how PDF would fit in here.


I don't think so: JPEG 2000, as far as I know, isn't generally supported for web use in web browsers, but it is supported in PDF.


> I don't think so: JPEG 2000, as far as I know, isn't generally supported for web use in web browsers, but it is supported in PDF.

Safari supported JPEG 2000 since 2010 but removed support last year [1].

[1]: https://bugs.webkit.org/show_bug.cgi?id=178758


So Firefox (or others) can't open a pdf with a embedded jpeg-2000/XL? Or does pdf.js somehow support it?



Apparently I really flubbed my wording for this comment. I'm saying they do support it inside of PDF, just not elsewhere in the web platform.


JPEG-XL is recommended as the preferred format for HDR content for PDFs, so it’s more likely to be encountered:

https://www.theregister.com/2025/11/10/another_chance_for_jp...


I'm not convinced HDR PDFs will be a common thing anytime soon, even without this chicken and egg problem of support


What I mean to say is, I believe browsers do support JPEG 2000 in PDF, just not on the web.


the last time that I check it, I find that I need to convert to Jpeg to show the image in browsers.


A *PDF* with embedded JPEG 2000 data should, as far as I know, decode in modern browser PDF viewers. PDF.js and PDFium both are using OpenJPEG. But despite that, browsers don't currently support JPEG 2000 in general.

I'm saying this to explain how JPEG XL support in PDF isn't a silver bullet. Browsers already support image formats in PDF that are not supported outside of PDF.


A large and important piece, but not the final. If it will remain web-only codec, that is no Android and iOS support for taking photos in JPEG XL, then the web media will still be dominated with JPEGs.



Follow the (subversive) hyperlink (which technically isn't a hyperlink on HN profile pages :D)... https://news.ycombinator.com/user?id=runamuck


https://startpage.com is a Google proxy instead of a Bing proxy (like DDG).


Pretty sure startpage sold out years ago...


Not a radical idea. The EU is already working on it.

> […] the Commission is pondering how to tweak the rules to include more exceptions or make sure users can set their preferences on cookies once (for example, in their browser settings) instead of every time they visit a website.

https://www.politico.eu/article/europe-cookie-law-messed-up-...


DNT header already does this. Explicit denial of consent. Reaches their servers before everything else so they have no excuse and zero room for maneuvering.

Now the EU just needs to turn it into an actual liability for corporations. Otherwise it will remain as an additional bit of entropy for tracking.


They can't. The website may very well do the opposite of the preference DNT signals. Meanwhile, proving in a court of law that the tracking still happens will be hard.

Services should be denied the capacity to track and fingerprint, not just told about a preference against it.

DNT will always be an "evil bit", regardless of any law behind it.


> They can't. The website may very well do the opposite of the preference DNT signals. Meanwhile, proving in a court of law that the tracking still happens will be hard.

Its not hard when it comes to any website of note, large companies can't easily hide what their computers are doing really, if they have code that tracks people it is gonna be found.


How do you deny the capacity to fingerprint? That's basically disabling JavaScript.


Essentially the same way uBlock Origin worked. A global list of offenders to block so that Javascript won't be loaded at all.

Asking browsers to implement uBlock Origin natively tho...


Adding a different web page-resident language?


DNT is considered deprecated in favor of GPC, which has legal backing in places with internet privacy laws. Funnily, Chrome still supports DNT but you need an extension to send a GPC header. Almost like the advertisement company wouldn't want people enabling legal privacy protections.


In Germany, DNT is legally binding, but GPC is not.


Sounds like we need browsers to select the correct header based on server IP lol.


GPC compliance is already the law in California. I don’t know why the EU has been so slow at making it legally binding. That said, existing cookie popups that don’t have “Reject All” as prominently placed as “Accept All” are already illegal but widespread, in no small part due to deliberate sabotage by the Irish DPA, so don’t expect GPC compliance to fare any better until consumer rights associations like NOYB.eu are allowed to initiate direct enforcement actions.


Plus, all GPC extensions advertised by the offical GPC pack other unsolicited privacy features and freemium models. I ended up building an extension https://chromewebstore.google.com/detail/gpc-enabler/ilknagn...


EU law typically has a lead time of at least two years.


The fact that it was turned on by default in edge really hurt it as an argument under these laws, because it then turned into a 'well we don't know the user actually selected this' thing. Making it explicitly have the force of law regardless would still be a good thing, though.


No, this wrong. The law says that by default you can't process personal data, unless the user gave consent. That setting matched both the expectation of users and the default as specified by the law.

The story that advertisers don't know what users selected and that somehow allows them to track the user is disingenous.


It doesn't allow them to track, but it does allow them to more convincingly argue that they can nag them about it (I think some regulators in some EU countries have rejected this, but I don't think this is universal). i.e. it makes it ineffective as a means of stopping the annoying pop-ups. Because the companies are basically belligerent about it there needs to be a clear declaration of 'if this header is set you may not track _and_ you may not bug the user about it'


How are they supposed to ask for consent then?


If the user has already indicated that they don't consent by setting the header, you don't ask. If they want to change, make it available as a setting.

(and frankly, the number of users that actively want to consent to this is essentially zero)


What if the user doesn't know they have that setting enabled. Or they enabled it to block some other company than your own.

I always constent to cookie popups so the number can not be 0.


Hence why I think the default hurt the initiative. And the header could be set on a per-domain basis, if you wanted that for some reason. I'm curious, why do you consent on such pop-ups?


Because it offers a better experience. The cookies are not pointless to the experience and you need all of them to have the full experience. The legal definition about what cookies are needed does not match reality.


What parts of the experience do you feel are missing if you do not consent to tracking? I have seen one or two cases of malicious compliance where rejecting tracking results in no state being kept, including having rejected it. Keep in mind that the legal definition is based on things that would not be reasonably expected to be kept or distributed in order to provide the service that the user is getting, you can do basically everything except targeted ads or selling user data under that definition, even if people who want to do the above are trying to pretend otherwise.


Targeted ads are part of the experience. They directly affect user satisfaction of the product. Relevant ads can increase user engagement. You may find it strange, but people prefer products with relevant ads.


People prefer products without ads at all. Ads are noise. People's brains literally learn how to filter them out via banner blindness.

People always comment that the internet is "so much nicer" after I install uBlock Origin on their browsers. It's just better, they can't explain why. They don't need to. I know why.

The fact is nobody wants this crap. Ads are nothing but noise in our signal. They're spam. They're content we did not ask for, forced upon us without consent. They do not improve the "experience", at best its impact is minimized.


Lol no one that doesn't work in ads thinks that way.


I always consent as well. They can show much more relevant ads when you consent to cookies. If I block cookies I get generic ads about stuff I don't care about.


Ah, I can't think of any level of relevance that would make me want to see ads, and in areas where I do want to see something, like recommendation systems, I've found that they are better when they are only based on the content I am currently looking at as opposed to based on some profile based on my whole history.


The popup never lets you choose to see fewer ads. It's a common misconception by lay people that you will see fewer ads if you block cookies, but that's not happening of course. So you may as well get relevant ones.


> It's a common misconception by lay people that you will see fewer ads if you block cookies, but that's not happening of course.

It absolutely will happen if you install uBlock Origin.


So essentially you prefer the psychological manipulation inflicted on you to be more effective? Yeah, that's not a good idea bro.


Just today I got an ad for a new theater show in town I'd like to see, I might have missed that if it wasn't for the targeted ad. Did they "manipulate" me into seeing it? I guess so. Do I mind? No, I'm capable enough to decide for myself.


It’s not just corporations. Look how much tracking nonsense goes into a recipe blog.


Recipe blogs are mostly "corporations" even if small ones. Most things you find at the top of Google search results aren't just enthusiastic individuals sharing their personal ideas with you but businesses who work hard to make sure you go to their websites rather than better ones.


The EU is already working on it? You have a strange definition of "already" ;)


> pondering how to tweak the rules to include more exceptions

“Hey what do you think? I dunno, what do you think? How about more tea?!”

Pondering how to tweak, unbelievable.


The alternative is that they tweak the laws without much thought...


Isn’t that the current status quo?


The GDPR has over 100k words, and those words are certainly less than 0.01% of the thought that has gone into this problem.


Agile laws might not be so terrible.


Counteropinion: agile laws would be absolutely terrible. Either people wouldn't take them seriously because they're going to change in a few minutes anyway, or people would take them seriously and be bound by law by the equivalent of late-night untested code that seemed like it should work.


Charitable interpretation of their comment: Law is implemented and then rapidly improved upon.

But yes, I think your take is more realistic as any measure that allows rapid changes also allows willful politics to rapidly make a mess.


Imagine being charged for something that you didn't yet know was a crime because you didn't watch the morning news.


Very cool! But… GitHub? Do they ever learn? 3, 2, 1… Takedown notice!


A quick inspection of the repo indicates that it doesn’t contain any copyrighted material. They’ve just uploaded the code to perform the decompilation.


Why would they take it down? Decompilations of Nintendo games are on GitHub as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: