Hacker Newsnew | past | comments | ask | show | jobs | submit | vunderba's commentslogin

I've also heard that RFK Jr. has been nominated for the NFL Prize in Medicine.

If you want to see something rather amusing - instead of using the LLM aspect of Gemini 3.0 Pro, feed a five-legged dog directly into Nano Banana Pro and give it an editing task that requires an intrinsic understanding of the unusual anatomy.

  Place sneakers on all of its legs.
It'll get this correct a surprising number of times (tested with BFL Flux2 Pro, and NB Pro).

https://imgur.com/a/wXQskhL


Does this still work if you give it a pre-existing many-legged animal image, instead of first prompting it to add an extra leg and then prompting it to put the sneakers on all the legs?

I'm wondering if it may only expect the additional leg because you literally just told it to add said additional leg. It would just need to remember your previous instruction and its previous action, rather than to correctly identify the number of legs directly from the image.

I'll also note that photos of dogs with shoes on is definitely something it has been trained on, albeit presumably more often dog booties than human sneakers.

Can you make it place the sneakers incorrectly-on-purpose? "Place the sneakers on all the dog's knees?"


My example was unclear. Each of those images on Imgur was generated using independent API calls which means there was no "rolling context/memory".

In other words:

1. Took a personal image of my dog Lily

2. Had NB Pro add a fifth leg using the Gemini API

3. Downloaded image

4. Sent image to BFL Flux2 Pro via the BFL API with the prompt "Place sneakers on all the legs of this animal".

5. Sent image to NB Pro via Gemini API with the prompt "Place sneakers on all the legs of this animal".

So not only was there zero "continual context", it was two entirely different models as well to cover my bases.

EDIT: Added images to the Imgur for the following prompts:

- Place red Dixie solo cups on the ends of every foot on the animal

- Draw a red circle around all the feet on the animal


i imagine the real answer is that the edits are local because that's how diffusion works; it's not like it's turning the input into "five-legged dog" and then generating a five-legged dog in shoes from scratch

Sounds like they used GenAI to make them. The "Editor" models (Seedream, Nano-Banana) can easily integrate a fifth limb to create the "dog with awkward walking animation".

https://imgur.com/a/wXQskhL


I just re-ran that image through Gemini 3.0 Pro via AI Studio and it reported:

  I've moved on to the right hand, meticulously tagging each finger. After completing the initial count of five digits, I noticed a sixth! There appears to be an extra digit on the far right. This is an unexpected finding, and I have counted it as well. That makes a total of eleven fingers in the image.
This right HERE is the issue. It's not nearly deterministic enough to rely on.

Thanks for that. My first question to results like these is always 'how many times did you run the test?'. N=1 tells us nothing. N=2 tells us something.

Anything that needs to overcome concepts which are disproportionately represented in the training data is going to give these models a hard time.

Try generating:

- A spider missing one leg

- A 9-pointed star

- A 5-leaf clover

- A man with six fingers on his left hand and four fingers on his right

You'll be lucky to get a 25% success rate.

The last one is particularly ironic given how much work went into FIXING the old SD 1.5 issues with hand anatomy... to the point where I'm seriously considering incorporating it as a new test scenario on GenAI Showdown.


https://gemini.google.com/share/8cef4b408a0a

Surprisingly, it got all of them right


Some good examples there. The octopus one is at an angle - can't really call that one pass (unless the goal is "VISIBLE" tentacles).

Other than the five-leaf clover, most of the images (dog, spider, person's hands) all required a human in the loop to invoke the "Image-to-Image" capabilities of NB Pro after it got them wrong. That's a bit different since you're actively correcting them.


It mostly depends on "how" the models work. Multi-modal unified text/image sequence to sequence models can do this pretty well, diffusion doesn't.

Multimodal certainly helps but "pretty well" is a stretch. I'd be curious to know what multimodal model in particular you've tried that could consistently handle generative prompts of the above nature (without human-in-the-loop corrections).

For example, to my knowledge ChatGPT is unified and I can guarantee it can't handle something like a 7-legged spider.


You didn't correct the hands being backwards? It gave you a man with six fingers on his right hand, and four fingers on his left.

In fact, one of the tests I use as part of GenAI Showdown involves both parts of the puzzle: draw a maze with a clearly defined entrance and exit, along with a dashed line indicating the solution to the maze.

Only one model (gpt-image-1) out of the 18 tested managed to pass the test successfully. Gemini 3.0 Pro got VERY close.

https://genai-showdown.specr.net/#the-labyrinth


super cool! Interesting note about Seedream 4 - do you think awareness of A* actually could improve the outcome? Like I said, I'm no AI expert, so my intuitions are pretty bad, but I'd suspect that image analysis + algorithmic pathfinding don't have much crossover in terms of training capabilities. But I could be wrong!

Great question. I do wish we had a bit more insight into the exact background "thinking" that was happening on systems like Seedream.

When you think about posing the "solve a visual image of a maze" to something like ChatGPT, there's a good chance it'll try to throw a python VM at it, threshold it with something like OpenCV, and use a shortest-path style algorithm to try and solve it.


I too have been throwing messages in bottles into a silent sea for a pretty long time, but I think I'm okay with that. It doesn't help if you also have difficulty adhering to quintessential blog SEO best practices.

1. Consistent theme - A diverse set of interests and a lethal dose of ADD make this virtually impossible

2. Consistent updates - My articles tend to be rather unusual, and I'll often combine them with customized interactive layouts. Even a monthly post would be pretty ambitious for me.

On a slightly related note, I'm hoping that zines [1] see a resurgence in popularity as I could see it being a good point of entry towards possibly gaining readership for those whose sites are inadvertently running in stealth mode.

[1] - Such as Paged Out (https://pagedout.institute)


Here is vunderba's site, for anyone who got curious too:

https://mordenstar.com/

I just skimmed but looks like exactly the content I love. It's a shame search engines are so narrow minded and such content so hard to discover.


Wow, thanks for link to Paged Out. Any ideas how to discover any more tech focused zines? We have a zine culture in UK but afaik it’s more culture / music focused (v happy to be wrong here)


Anyone know what happened to NODE? They seemed to have had a good momentum and then went radio silent.

Reminds me of an excerpt from Tom Wolfe’s book The Right Stuff in which fighter pilots perceived doctors as the enemy, and heaven forfend you saw a psychiatrist!

  A man could go for a routine physical one fine day, feeling like a million dollars, and be grounded for fallen arches. It happened!—just like that! (And try raising them.) Or for breaking his wrist and losing only part of its mobility. Or for a minor deterioration of eyesight, or for any of hundreds of reasons that would make no difference to a man in an ordinary occupation. As a result all fighter jocks began looking upon doctors as their natural enemies. Going to see a flight surgeon was a no-gain proposition.

This reminds me of when a friend became a cop. One day I saw him or I thought I saw him from far away but I couldn’t tell him that I wasn’t sure it was him because I couldn’t recognise him because of my myopia and, since I sometimes drive without my glasses on - what if one day he caught me?

> since I sometimes drive without my glasses on

It always baffles me how blasé people are about driving safety. The rules for driving aren't even that hard to follow. Yet people just seem constitutionally unable to do so.


Why are you driving around without your glasses?

My prescription is very low and I can drive fine without glasses but I can’t recognise people at a long distance.

A lot of people drive fine when they are drunk.

From the article:

> I wanted her take on Wanderfugl , the AI-powered map I've been building full-time.

I can at least give you one piece of advice. Before you decide on a company or product name, take the time to speak it out loud so you can get a sense of how it sounds.


I grew up in Norway and there's this idea in Europe of someone who breaks from corporate culture and hikes and camps a lot (called wandervogel in german). I also liked how when pronounced in Norwegian or Swedish it sounds like wander full. I like the idea of someone who is full of wander.

In Swedish the G wouldn't be silent so it wouldn't really be all that much like "wonderful"; "vanderfugel" is the closest thing I could come up with for how I'd pronounce it with some leniency.

Same in Danish FWIW.

In English, I’d pronounce it very similar to “wonderful”.


If OP dropped the g, it would be a MUCh better product name.

Solid advice. Seeing how many here would pronounce it differently, I totally agree hahah

this would make it even closer to the dangerously similar travel planning app "wanderlog"

I actually own wanderfull.ai

Drop an l would be better I think

Are you a native speaker? Because said quickly in typical English it sounds like "wonderfukl" which isn't great.

Not a native English speaker, no.

The weird thing is that half of the uses of the name on that landing page spell it as "Wanderfull". All of the mock-up screencaps use it, and at the bottom with "Be one of the first people shaping Wanderfull" etc.

So even the creator can't decide what to call it!


AI probably generated all of that and the OP didn't even review its output.

I think the more pressing advice here is, limit yourself to one name (https://wanderfugl.com/images/guides.png)

this must be one of the incredible AI innovations the folks in Seattle are missing out on


Also, do it assuming different linguistic backgrounds. It could sound dramatically different by people that speak English but as second language, which are going to be a whole lot of your users, even if the application is in English.

If there is a g in there I will pronounce a g there. I have some standards and that is one. Pronouncing every single letter.

> Pronouncing every single letter.

Now I want to know how you pronounce words like: through, bivouac, and queue.


You don’t pronounce all the letters?

no. ever heard of silent letters?

I'm a native speaker of English, northern California dialect. I pronounce every one of those letters, to varying degrees. Some just affect the mouth shape by subtle amounts, but it is there.

> I pronounce every one of those letters, to varying degrees

That must be fun any time you talk about Worcestershire (the sauce or the place).


I was only talking about the examples given.

That's a gnarly standard you have there.

obviously not a native French speaker

It's pronounced wanderfull in Norwegian

And how many of your users are going to have Nordic backgrounds?

I personally thought it was wander _fughel_ or something.

Let alone how difficult it is to remember how to spell it and look it up on Google.


Just FYI, I would read it out loud in English as “wander fuggle”. I would assume most Americans would pronounce the ‘g’.

I thought ‘wanderfugl’ was a throwback to ~15 years ago when it was fashionable to use a word but leave out vowels for no reason, like Flickr/ /Tumblr/Scribd/Blendr.


The one current paying user of the app I've seen in this discussion called it "Wanderlog". FYI on the stickiness of the current name.

wanderlog is a separate web service

https://wanderlog.com/


"Wanderful" would be a better name.

And if you manage to say it outloud, say it to someone else and ask them to spell it. If they can’t spell it, they can’t type it into the url bar.

Maybe that's why they didn't go with the English cognate i.e. Wanderfowl, since being foul isn't great branding

What's wrong with wahn-der-fyoo-gull?

What? You don't want travel tips from an itinerant swinger? Or for itinerant swingers?

The only link on the site to a source repository is a 404 Github repository.

https://github.com/safekeylab

EDIT: Manually searching Github leads to this https://github.com/sukincornell/safekeylab (assuming that is the correct one)


Thanks for flagging. We're not open-source — the GitHub link shouldn't have been on the site. Removing it now. We offer a private SDK for customers. If you want to test it, you can go to the website and create your account or ping me at sukin@safekeylab.com

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: