Hope this on day will be used for auto-tagging all video assets with time codes. The dream of being able to search for running horse and find a clip containing a running horse at 4m42s in one of thousands of clips.
It’s not difficult to hack this together with CLIP. I did this with about a tenth of my movie collection last week with a GTX 1080 - though it lacks temporal understanding so you have to do the scene analysis yourself
I'm guessing you're not storing the CLIP for every single frame, instead of every second or so? Also, are you using the cosine similarity? How are you finding the nearest vector?
Sure. I had a lot of help from Claude Opus 4.5, but it was roughly:
- Using pyscenedetect to split each video on a per scene level
- Using the decord library https://github.com/dmlc/decord to pull frames from each scene at a particular sample rate (specific rate I don't have handy right now, but it was 1-2 per scene)
- Aggregating frames in batches of around 256 frames to be normalized for CLIP embedding on GPU (had to re-write the normalization process for this because the default library does it on CPU)
- Uploading the frames along with metadata (timestamp, etc) into a vector DB, in my case Qdrant running locally along with a screenclip of the frame itself for debugging.
I'm bottlenecked by GPU compute so I also started experimenting with using Modal for the embedding work too, but then vacation ended :) Might pick it up again in a few weeks. I'd like to be able to have a temporal-aware and potentially enriched search so that I can say "Seek to the scene in Oppenheimer where Rami Malek testifies" and be able to get a timestamped clip from the movie.
I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well.
> I run it all the time, token generation is pretty good.
I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?
I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.
6 tok/sec might be acceptable for a dense model that doesn't do thinking, but for something like DeepSeek 3.2 that does do reasoning, 6 tok/sec isn't acceptable for anything else but async/batched stuff, sadly. Even for a response with just 100 tokens we're talking a minute for it to just write the response, for anything except the smallest of prompts you'll easily be hitting 1000 tokens (600 seconds!).
Maybe my 6000 Pro spoiled me, but for actual usage, 6 or even 9 tok/sec is too slow for a reasoning/thinking model. To be honest, kind of expected on CPU though. I guess it's cool that it can run on Apple hardware, but it isn't exactly a pleasant experience at least today.
Dunno, DeepSeek on MacStudio doesn't feel much slower than when using it directly on deepseek.com; 6t/s is still around 24 characters per second which is faster than many people could read. I also have 6000 Pro but you won't fit any large model there and to be able to run DeepSeek R1/3.1/3.2 671B at Q4 you'd need 5-6 of them depending on the communication overhead. MacStudio is the simplest solution to run it locally.
> 6t/s is still around 24 characters per second which is faster than many people could read.
But again, not if you're using thinking/reasoning, which if you want to use this specific model properly, you are. Then you have a huge delay before the actual response comes through.
> MacStudio is the simplest solution to run it locally.
Obviously, that's Apple's core value proposition after all :) One does not acquire a state-of-the-art GPU and then expect simple stuff, especially when it's a fairly uncommon and new one. You cannot really be afraid of diving into CUDA code and similar fun rabbit holes. Simply two very different audiences for the two alternatives, and the Apple way is the simpler one, no doubt about it.
I don’t really care any desktop environment. I use i3. Otherwise I don’t really have much preference between how Firefox, terminal, and Steam are displayed.
I learned about Aurora from a HN comment some weeks ago, and it has been so awesome. I really haven't been as impressed with a distro since the first ubuntu. Its just a rock solid base, awesome defaults, and kde being delightful.
I will offer a second positive but more reserved data point. It took me closer to a day to get my custom Bazzite build working.
Switching over to my images using bootc failed because of what I eventually tracked down to permissions issues that I didn't see mentioned in any of the docs. In short, the packages you publish to Github's container registry must be public.
Another wrinkle: The Bazzite container-build process comes pretty close to the limits of the default Github runners you can run for free. If you add anything semi-large to your custom image, it may fail to build. For example, adding MS's VSCode was enough to break my image builds because of resource limits.
Fortunately, both of these issues can be fixed by improving the docs.
The actual process for the image is really just what I said. In the video he sets up a github actions automatic build, and adds signing with cosign (which are also all steps you really want to do) but to have custom stuff in your base os is really as easy as a Dockerfile (or should I say Containerfile ?)
It's a huge piece for sure, but not the only one. For example, Firefox and Windows both don't support it out of the box currently. Firefox requires nightly or an extension, and on Windows you need to download support from the Microsoft store.
> on Windows you need to download support from the Microsoft store.
To be really fair, on Windows:
- H.264 is the only guaranteed (modern-ish) video codec (HEVC, VP9, AV1 is not built-in unless the device manufacturer bothered to do it)
- JPEG, GIF, and PNG are the only guaranteed (widely-used) image codecs (HEIF, AVIF, and JXL is also not built-in)
- MP3 and AAC are the only guaranteed (modern-ish) audio codecs (Opus is another module)
... and all of them are widely used when Windows 7 was released (before the modern codecs) so probably modules are now the modern Windows Method™ for codecs.
Note on pre-8 HEVC support: the codec (when not on VLC or other software bundling their own codecs) is often on that CyberLink Bluray player, not a built-in one.
Brotli? Is it still relevant now that we have Zstandard?
Zstandard is much faster in just about every benchmark, sometimes Brotli has a small edge when it comes to compression ratio, but if you go for compression ratio over speed, LZMA2 beats them both.
Both Zstandard (zstd) and LZMA2 (xz) are widely supported, I think better supported than Brotli outside of HTTP.
Brotli decompresses 3-5x faster than LZMA2 and is within 0.6 % of the compression density, and much better for short documents.
ZStandard decompresses ~2x faster than Brotli but is 5 % less dense in compression density, and even less dense for short documents or documents where the static dictionary can be used.
Brotli is not slow to decompress -- generally a little faster then deflate through zlib.
Last time I measured, Brotli had ~2x smaller binary size than zstd (dec+enc).
The thing is that Brotli is clearly optimized for the web (it even has a built-in dictionary), and ZStandard is more generic, being used for tar archives and the likes, I wonder how PDF would fit in here.
A *PDF* with embedded JPEG 2000 data should, as far as I know, decode in modern browser PDF viewers. PDF.js and PDFium both are using OpenJPEG. But despite that, browsers don't currently support JPEG 2000 in general.
I'm saying this to explain how JPEG XL support in PDF isn't a silver bullet. Browsers already support image formats in PDF that are not supported outside of PDF.
A large and important piece, but not the final. If it will remain web-only codec, that is no Android and iOS support for taking photos in JPEG XL, then the web media will still be dominated with JPEGs.
Not a radical idea. The EU is already working on it.
> […] the Commission is pondering how to tweak the rules to include more exceptions or make sure users can set their preferences on cookies once (for example, in their browser settings) instead of every time they visit a website.
DNT header already does this. Explicit denial of consent. Reaches their servers before everything else so they have no excuse and zero room for maneuvering.
Now the EU just needs to turn it into an actual liability for corporations. Otherwise it will remain as an additional bit of entropy for tracking.
They can't. The website may very well do the opposite of the preference DNT signals. Meanwhile, proving in a court of law that the tracking still happens will be hard.
Services should be denied the capacity to track and fingerprint, not just told about a preference against it.
DNT will always be an "evil bit", regardless of any law behind it.
> They can't. The website may very well do the opposite of the preference DNT signals. Meanwhile, proving in a court of law that the tracking still happens will be hard.
Its not hard when it comes to any website of note, large companies can't easily hide what their computers are doing really, if they have code that tracks people it is gonna be found.
DNT is considered deprecated in favor of GPC, which has legal backing in places with internet privacy laws. Funnily, Chrome still supports DNT but you need an extension to send a GPC header. Almost like the advertisement company wouldn't want people enabling legal privacy protections.
GPC compliance is already the law in California. I don’t know why the EU has been so slow at making it legally binding. That said, existing cookie popups that don’t have “Reject All” as prominently placed as “Accept All” are already illegal but widespread, in no small part due to deliberate sabotage by the Irish DPA, so don’t expect GPC compliance to fare any better until consumer rights associations like NOYB.eu are allowed to initiate direct enforcement actions.
The fact that it was turned on by default in edge really hurt it as an argument under these laws, because it then turned into a 'well we don't know the user actually selected this' thing. Making it explicitly have the force of law regardless would still be a good thing, though.
No, this wrong. The law says that by default you can't process personal data, unless the user gave consent. That setting matched both the expectation of users and the default as specified by the law.
The story that advertisers don't know what users selected and that somehow allows them to track the user is disingenous.
It doesn't allow them to track, but it does allow them to more convincingly argue that they can nag them about it (I think some regulators in some EU countries have rejected this, but I don't think this is universal). i.e. it makes it ineffective as a means of stopping the annoying pop-ups. Because the companies are basically belligerent about it there needs to be a clear declaration of 'if this header is set you may not track _and_ you may not bug the user about it'
If the user has already indicated that they don't consent by setting the header, you don't ask. If they want to change, make it available as a setting.
(and frankly, the number of users that actively want to consent to this is essentially zero)
Hence why I think the default hurt the initiative. And the header could be set on a per-domain basis, if you wanted that for some reason. I'm curious, why do you consent on such pop-ups?
Because it offers a better experience. The cookies are not pointless to the experience and you need all of them to have the full experience. The legal definition about what cookies are needed does not match reality.
What parts of the experience do you feel are missing if you do not consent to tracking? I have seen one or two cases of malicious compliance where rejecting tracking results in no state being kept, including having rejected it. Keep in mind that the legal definition is based on things that would not be reasonably expected to be kept or distributed in order to provide the service that the user is getting, you can do basically everything except targeted ads or selling user data under that definition, even if people who want to do the above are trying to pretend otherwise.
Targeted ads are part of the experience. They directly affect user satisfaction of the product. Relevant ads can increase user engagement. You may find it strange, but people prefer products with relevant ads.
People prefer products without ads at all. Ads are noise. People's brains literally learn how to filter them out via banner blindness.
People always comment that the internet is "so much nicer" after I install uBlock Origin on their browsers. It's just better, they can't explain why. They don't need to. I know why.
The fact is nobody wants this crap. Ads are nothing but noise in our signal. They're spam. They're content we did not ask for, forced upon us without consent. They do not improve the "experience", at best its impact is minimized.
I always consent as well. They can show much more relevant ads when you consent to cookies. If I block cookies I get generic ads about stuff I don't care about.
Ah, I can't think of any level of relevance that would make me want to see ads, and in areas where I do want to see something, like recommendation systems, I've found that they are better when they are only based on the content I am currently looking at as opposed to based on some profile based on my whole history.
The popup never lets you choose to see fewer ads. It's a common misconception by lay people that you will see fewer ads if you block cookies, but that's not happening of course. So you may as well get relevant ones.
Just today I got an ad for a new theater show in town I'd like to see, I might have missed that if it wasn't for the targeted ad. Did they "manipulate" me into seeing it? I guess so. Do I mind? No, I'm capable enough to decide for myself.
Recipe blogs are mostly "corporations" even if small ones. Most things you find at the top of Google search results aren't just enthusiastic individuals sharing their personal ideas with you but businesses who work hard to make sure you go to their websites rather than better ones.
Counteropinion: agile laws would be absolutely terrible. Either people wouldn't take them seriously because they're going to change in a few minutes anyway, or people would take them seriously and be bound by law by the equivalent of late-night untested code that seemed like it should work.
A quick inspection of the repo indicates that it doesn’t contain any copyrighted material. They’ve just uploaded the code to perform the decompilation.
reply