What surprised me in your story is that you have actually written those half finished books and songs. You've spend your time, you've tried it out and have an actual understanding of how this process works.
Because I've heard dozens of stories like yours, of people regretting not following their artistic pursuits. Practically all of those people didn't write anything at all and never tried to put the pen to paper.
For those people the path to artistry in an eternity. But you are already there. You've made the hardest steps and now you can only get better.
After confirming the ukraininan phone every page gives the same error - "OpenAI's API is not available in your country." The same goes for GPT-3. They just banned our country as a whole.
For a ukraininan right now this situation is admittedly not the biggest problem in my life. Still don't see a reason for OpenAI to restrict our access.
I love that corpspeak exists. It makes it possible for me to make a living.
I am not a native english speaker. I am doing ok, but often I have to google pre-made sentences when I feel that I sound off, but can't figure out the problem by myself.
I also lack the natural talent for talking with people online. I am a graphic designer, so my primary skills are in the visual art stuff, and all the speaking and project management is a secondary skill at best. But I couldn't work as a freelancer without it.
Without corpspeak culture I would constantly sound like either a robot or a dummy, and it would hurt my career by a lot. Thanks god everyone in the corporate world already sounds like that. I can just follow this simple set of tropes and pre-made sentences and it all works out well.
I'm not sure about OpenAI's internal policies, but I've worked for several SaaS companies that complied with Export Administration Regulations (EAR) to limit access to "Embargoed and Sanctioned Countries" which has meant CRIMEA - REGION OF UKRAINE, CUBA, IRAN, NORTH KOREA, and SYRIA. ITAR (defense related) and OFAC (finance related) both have further restrictions. For us, it was easier to block all of Ukraine rather than limit to access to Crimea in particular.
Trade compliance requirements may require embargoing occupied Ukrainian territories however it isn’t necessarily possible to distinguish between occupied and unoccupied regions. Export controls might be applied based on geoIP with granularity at the country level. Unfortunately the safest position from a compliance perspective is to block the entire country than risk getting it wrong.
I am already doing exactly that, and am getting paid for it.
I am a logo artist and I sell pre-made logo designs. Before the current AI services I had to come up with visual ideas by myself, like a caveman. Now I use the AI to generate a bunch of sketches and blurry ideas, and then use my graphic design experience to polish them up to a usable level. Here's how it looks.
https://imgur.com/a/DKTsKdC
I am absolutely sure that a lot of people are doing the same right now, just keeping quiet about it.
Thank you for that perspective. The linked work is clearly the work of a skilled professional.
I am intrigued by the use of AI as a form of creativity assist. As someone without any talent for this, the left pictures are useless for me, as I don't know how to take them into something like the pictures on the right. The point of a sketch is to show them to a customer, but if you would show these sketches to me, I wouldn't know which one would turn out great and which one wouldn't.
Given that, do you feel that the generated sketches are useful as a base sketch? I mean, you could probably have used any of the existing NFL teams logos and as inspiration, instead of letting the software remix them for you.
Using existing logos for inspiration (like the NFL ones) is a tightrope because when you get inspired too much you get into legal problems. No one wants a plagiarised or copied logo. Every small borrowed idea, color, composition or a detail is a potential problem. And if you share too much of them - you have an unusable logo.
No such problem with AI logos. You can be as much inspired as you want by it, up to a straight copy if you find it good enough.
Yes, Dall-e 2 is definitely much better at it than other current AI services. But, most people still don't have access to it. Like me for example. I've been in the waitlist from the start, got a cool portfolio - still didn't get in.
Maybe I'm just unlucky, or maybe the problem is that I'm in Ukraine. For some reason OpenAi GPT-3 Playground is restricted in our country, so I expect that other OpenAi products might be closed to me for the same reason.
I've seen many examples of Dalle-2 logos similar to yours. It seems like it got at least to a step "good enough to be usable, but with quality on the cheaper side". Which is super impressive and already puts a lot of designers out of work. I sell cheap stock logos on the side, I would feel that loss of income.
But right at this moment there's definitely still a quality ceiling for it and the AI didn't put me out of the job completely. If you want a good and expensive-looking logo, you still need a professional human.
I won't be surprised when that changes too and the AI's get to my level. They are already so much closer than I could ever imagine. But at this moment they aren't there.
Outside of my regular job, I am an indie folk music artist, trying to rise in the local music scene here in Dnipro, Ukraine. Even though learning to be a musician from scratch in my 20s was a hard process that took years and years, by far the hardest part of it was trying to establish a passive income, so I could have enough free time for practicing, performing and writing music.
A lot of my talented peers are so much better then me in all of the music and performing stuff, but can't find enough time for it between regular boring work. Woody Allen said that 80% of success is just showing up, and it seems true. But now I see how a lot of talented people simply can't afford to show up. They are missing open mics and performing opportunities because they can't skip another shift as a barista, they can't find time for rehearsal because of soul killing the low paying bank job. I keep thinking about all the beautiful songs that are left to be unwritten.
I guess the life of artists was always like that - either you are struggling, or you have a source of passive income that carries you through the development years. And I do think that this moment in history is as full of opportunity as it ever was. Still, it was a surprising discovery for me. I really thought that at least at the starting level it would be mostly about who plays their chords better, and it surprisingly isn't.
So it's good for some fields other than performers?
I've met plenty of stock brokers and engineers that skate by on credentials - I try to avoid them professionally because they tend to produce workplaces with exceedingly high demands and low compensation due to the drain they introduce on the system... but they continue to exist.
Heck, HN has many time had discussion on C-level folks who basically revolving door their way from failure to failure and still get huge golden parachutes when they sign on to a new company even though their performance history is trash.
Even fields that require performance can have strong networking requirements. Academia, software, traditional engineering, etc. One’s reputation is rarely purely or even strongly reputation based
Yeah, it's weird. The music scene is as live as it ever was here, along with all other parts of the normal city life.
It's not really a new situation for us. This war continues for 8 years, and all this time the front was about 200 miles from us. During this new phase of the invasion the frontline got a little closer to Dnipro, but not that much.
Regular life here stopped in winter-spring, when we didn't know which cities will withstand this phase of the invasion. Tragically Kherson, Mariupol and many others are lost as of now. But we in Dnipro were lucky enough and life kind of continues here.
There are changes of course. Practically no artist in Ukraine gets paid now at any level. Every single concert is for charity, gathering funds for arms or refugees. And, as with all other life, there are constant interruptions of air raid sirens.
Other than that, music scene lives as usual. People still go to concerts and artists still perform. Predictably a lot of sad sad songs gets written now, but honestly no one really wants to hear them - everyone here gets enough negativity from everywhere else. The best bet is to stick to the happy and hopeful stuff.
I really do wish we had "basic income" in the US. Besides helping out the lowest socioeconomic class, it really seems like this would benefit artists, too.
I'm quite shocked that someone on HN would overlook the absurdly disproportionate to their population of the contributions of the Nordic and Scandinavian countries to the open source ecosystem.
And, let's take a stronger look at the UK, for example. Can you name a few people who were on the "dole"? I can start with J. K. Rowling and Noel Gallagher, to start.
I have heard about only a handful of Finnish artists and entrepreneurs. Nothing remarkable stands out there compared to other countries without UBI or a generous safety net, as is the context of this whole thread ("UBI would benefit artists").
Have you got any data about how Finland stands out in artistic output vs other countries? Or just random anecdotes which can be found even in countries without any welfare.
Inflation was also in service sector (eg. restaurants jacking up prices due to cost of labor). Before implementing UBI, let's have an alternative system for jobs like janitors, restaurant workers, mall/retail workers etc.
> I guess the life of artists was always like that - either you are struggling, or you have a source of passive income that carries you through the development years
Wealthy parents, wealthy partner.
I wouldn't really mind, if I married a girl and she decided after we had kids that she just wanted to be a stay-at-home mom.
And she like paints or something. But that notion also devalues domestic housework, which is just as valuable as typing all day to make ad revenue go up.
But I would still like to think a stay-at-home parent could carve out more time for their hobbies. Particularly once the kids are in school.
Being a music person like me, I totally would love to have classical pianist as a wife and just come home every day to hearing classical piano.
Stable Diffusion is mind-blowingly good at some things. If you are looking for modern artistic illustrations (like the stuff that you would find on the front page of Artstation) - it's state of the art, better in my opinion then Dalle-2 and Midjourney.
But, the interesting thing is that while it is so good in producing detailed artworks and matching the styles of popular artists, it's surprisingly weak at other things, like interpreting complex original prompts. We've all seen the meme pictures made in Craiyon (previously Dalle-mini) of photoshop-collage-like visual jokes. Stable Diffusion with all its sophistication is much worse at those and is struggling to interpret a lot of prompts that the free and public Craiyon is great with. The compositions are worse, it misses a lot of requested objects or even misses the idea entirely.
Also as good as it is at complex artistic illustrations, it is as bad at minimalistic and simple ones, like logos and icons. I am a logo designer and I am already using AI a lot to produce sketches and ideas for commercial logos, and right now the free and publicly available Craiyon is head and shoulders better at that then Stable Diffusion.
Maybe in the future we will have a universal winner AI that is the best at any style of pictures that you can imagine. But right now we have an interesting competition when different AI have surprising strengths and weaknesses and there's a lot of reason in trying them all.
For those unaware this is a catchphrase of Dr. Karoly Feher from the absolutely wonderful YouTube channel "Two Minute Papers" which focuses on advances in computer graphics and AI.
Random rant: it feels like over time Two Minutes Paper has started to lean more and more into its catchphrases and gimmicks, while the density of interesting content keeps decreasing.
The whole "we're all fellow scholars here" bit feels like I'm watching a kid's show about science vulgarization, patting me on the head for being here.
"Look how smart you are, we're doing science!"
I dunno. I like the channel for what it is (a vulgarization newsletter for cool ML developments) but sometimes the author feels really patronizing / full of himself.
I agree that I like to for what it is - something more along the lines of Popular Science or Wired than Scientific American if you want to compare to magazines. However, the content, while surface level, is always accurate - something that can’t be said for other content creators in the field.
I agree that it can be a lot at times, especially if you watch several in a row, but I dunno, I kind of love that he's keeping that enthusiasm (real or not). I think the world is a brighter place because of it. Just a tiny bit, but still.
I think the biggest benefit is the curation aspect. After all, how much can you actually learn in two minutes? Once I see something interesting, I go and read through the actual paper. Having said that, you're lucky if you can find a paper with enough details to actually reproduce the work.
it's surprisingly weak at interpreting complex original prompts because the model is really small, the text encoder is just 183M parameters. Craiyon is much larger.
I have a penchant for wanting to make technically "bad" or heavily stylized photos - and Stable Diffusion is pretty poor at those. There's very little good bokeh or tilt shift stuff and CCTV/Trailcam doesn't come out too well.
In fact Dall-E isn't as impressive for some styles as "older" models (Jax/Latent Diffusion etc)
> 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0
That aesthetic predictor was apparently trained on only 4000 images. If my thinking is correct, imagine the impact those 4000 ratings have had on all of the output of this model.
You can see samples (some NSFW) of different images from the original training set in different rating buckets here, to get an idea of what was included or not in those training steps. http://3080.rom1504.fr/aesthetic/aesthetic_viz.html
You can run craiyon / dalle-mini on a card with 8GB of VRAM if you decrease batch size to 1 and skip the CLIP step. Takes about 7 sec to generate an image on a 3070.
Craiyon is free, whereas Midjourney is not. If you want MJ level quality, check out Disco Diffusion or go straight to Visions of Chaos, which runs just about every AI diffusion script in existence. The dev is very active and adds new features every couple days, such as recently the ability to train your own diffusion models, which I've been doing the last 3 days nonstop on my little 3060 Ti (8GB VRAM, which is barely sufficient to run at mostly default settings).
MidJourney does give you 25 minutes of free compute time though. Which is enough for at least trying it ~40 times
I've checked out Disco Diffusion but hadn't heard of Visions of Chaos, thanks. The biggest shortcoming to DD is there's simply not yet a sufficiently trained model to produce stuff to the level of MidJourney or Craiyon
Are they giving out trials without an invite now? I was invited from someone who was paying, became addicted by the end of my trial, subscribed for a month, then gave out invites to friends, some of which ended up also paying - not a bad business model! It was bad timing though, the day after I joined, I discovered Disco Diffusion and haven't stopped rendering since (roughly 10k images rendered, mostly for animations). It takes longer, the results are often less realistic compared to Midjourney, Dall-E (1/2) or Stable Diffusion (which I've been toying with for a few weeks), but it's somehow much more satisfying having to wait xx minutes for a render to complete, running on your own local PC, not having to use bots with 1000 other people in a channel spamming their prompts, and having TONS of parameters to play around with. I have a google drive full of docs from my own studies, comparing parameter values, models, prompts, etc. I'm really looking forward to Stable Diffusion releasing their models, I know VoC will add those models as soon as they are available. On top of that, VoC has been adding support for diffusion models (of which I'm training my own), but there are new ones added constantly as more and more people build models for e.g. pixel art, medieval style, monochromatic, etc.
Also the results vary drastically if you have enough VRAM to load more models e.g. a 3090 (24GB) or an A6000 (48GB). I've been saving money and waiting impatiently for 4090s to drop. Check out the Disco Diffusion or VoC Discord - people post their works in there and often you will see results that make you wonder if they're cheating ;)
I'd start with VITL14 and add one or more RN types. I personally like to use multiple VIT and RN models just to fill up VRAM. What exactly will fit depends on your output resolution and requires a lot of trial and error. I always have Task Manager open to monitor VRAM usage. In general, VIT = more realistic, RN = more artistic. It can take a lot of experimentation to find what exactly tickles your fancy. I constantly change which models I'm using depending on what I'm going for. This redditor did a nice comparison[0], and there are many more "studies" for which models to use - you can google around, there are new articles/studies being posted daily.
You can also try disabling use_checkpoints if you have extra VRAM, since it will render a bit faster (but uses more VRAM since it doesn't save intermediary 'checkpoints' to disk).
When/if you get bored, try disabling use_secondary_models which will use a lot more VRAM but can deliver results on a completely different level. You will likely struggle for a few days figuring out which parameters to tweak to get good results (e.g. tv_scale, sat_scale, etc, which are otherwise AFAIK ignored).
In any case I recommend reading A Traveler's Guide to the Latent Space, which I call "The Bible" since it covers so many topics and has links to various studies and will keep you busy reading for months ;)
Also check out the Discord for Disco Diffusion and Visions of Chaos, as you can read endless tips and tricks to getting amazing results.
I am a working digital artist, my personal feed is filled with takes like this. I understand the crisis that artists go through currently and I understand their emotions. I've personally made peace with it already, but it will take time for everyone and there will be a lot of concern, rage and bargaining in the process.
All of us artists woke up in the world where the thing we were learning to produce all of our lives became 100x cheaper. I've been kinda waiting for this to happen for truck drivers or translators and was honestly surprised that it got to the artists first. But, there's nothing to do now.
There will be a lot of takes trying different ways to deny this reality. Like if we agree that this thing is immoral or anti-cultural it will go away and our potential customers will forget about it. I think that it's a waste of time
The industry is not completely gone, but it will go though a massive transformation. The basic hierarchy of artists will probably stay the same - people who were good in the pre-ai era will still be good working with the ai. If you've learned your basic skills right - they will still work.
I've personally already changed my workflow to use ai art for the sketches, ideas and references. And then polish them to commercial level using my old experience and skills. It's funny how this thing is so new, but I've gotten so used to it that it's already hard for me to come back to my old set of tools. And I do recommend all other artists to do the same and to try to find their place in the new reality - it really seems like there's no other way.
I believe that it already did, honestly.
The output and range of Dalle-2 and Stable Diffusion is unmatched by any human artist. I am not only talking about quantity and range - the best illustrations generated by Stable Diffusion are honestly unbelievable and could totally get on the front pages on artstation on their own. The digital paintings, the 3d renders, landscapes, portraits, concept arts, character designs - I've seen too many world class examples of those generated by AI. The best human artists can usually be world-class at a single style or a genre, while Stable Diffusion is world class at hundreds of them.
There's of course no way to judge art objectively. And there are at the moment a lot of gaps that AI is not the best at. Still, if there was a single human artist that could do the things that stable diffusion can do - he would be the best artist that ever lived, no doubt.
I feel that the illustrations are still at least somewhat in the uncanny valley. I don't think that the quality is already there. What you see is only a subset. Many creations are really bad, creepy and unbalanced.
I think this will change up to the extent that the majority of people won't care.
Wow, interesting. I’m an ML researcher creating models for music generation and I can say with certainty we are pretty far from generating human level quality music.
Depends what your benchmark is. As the AI stuff is getting better, things like MIDI and Autotune lower the bar to where it's understandable some might see the convergence coming sooner than later.
The bar is the same as with human music - if people pick it on Spotify without realizing it’s AI generated with zero human involvement then it’s good enough.
I think the bigger point is that while it might not be better, it could be "good enough" and 1000x faster/cheaper at the same time.
Instead of hiring 100 artists, you could have this 1 model generating images, and then 10 artists touching up AI's results to take them from 90% to 99% on quality.
I personally couldn’t care less about 99% quality. I want to see genius level work - of the same impact as Monet, Matisse, Picasso, Dali - something unlike anything else in the dataset.
I've tried out a couple of prompts from the post in Stable Diffusion and as expected the results were much weaker. It has drawn some alpacas and basketballs with little relation between the objects.
I've been playing with Stable Diffusion a lot, and in my experience its results are much weaker then what's shown in this post. The artistic pictures that it generates are beautiful, often more beautiful then Dalle-2 ones. But it has a real problem understanding the basic concepts of anything that is not the simplest task like "draw a character in this or that style". And explaining the situations in detail doesn't help - the AI just stumbles upon basic requests.
Seems like Stable Diffusion has a much more shallow understanding of what it draws and can only produce good result for things very similar to the images it learned from.
For example, it could generate really good dutch still life paintings for me - with fruits, bottles and all the regular expected objects for this genre of painting. But when I've asked it to add some unusual objects to the painting (like a Nintendo switch, or a laptop) - it couldn't grasp this concept and just added more warbled fruit. Even though the system definitely knows how a Switch looks like.
The results in the post are much more impressive. I doubt that Dalle-2 saw a lot of similar images in training, but in all of the styles and examples it definitely understood how a llama would interact with a basketball, what are their relative sizes and stuff like that. On surface results from different engines might look similar, but to me this is an enormous difference in quality and sophistication.
Stable Diffusion has a smaller text encoder than Dalle 2 and other models (Imagen, Parti, Craiyon) so that it can fit into consumer GPUs. I believe StabilityAI will train models based on a larger text encoder, the text encoder is frozen and does not require training, so scaling the text encoder is quite free.
For now this is the biggest bottleneck with Stable Diffusion, the generator is really good and the image quality alone is incredible (managing to outperform Dalle 2 most of the time).