I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.
I work in Photoshop all day, and I 100% agree. Also, I just retried a task that wouldn't work last night on nano-banana and it worked first time on the released model, so I'm wondering if there were some changes to the released version?
We had an exhibition some time back where I used AI to generate the posters for our product. This is a side project and not something we do seriously, but the results were outstanding - better than what the majority of much bigger exhibitors had.
It took me a LOT of time to get things right, but if I was to get an actual studio to make those images, it would have cost me a thousands of dollars
Yeah, played around with it, it created an amazing poster for starfinder ttrpg ( something like DND) with specifies who looked really! Good. Usually stuff likes this fails hard, since there isn't much training data of unique fantasy creatures.
The app looks interesting, but I think it needs some documentation. I think I generated something? Maybe? I saw a spinny thing for awhile, but then nothing.
I couldn't get the 3d thing to do much. I had assets in the scene but I couldn't for the life of me figure out how to use the move, rotate or scale tools. And the people just had their arms pointing outward. Are you supposed to pose them somehow? Maybe I'm supposed to ask the AI to pose them?
Inpainting I couldn't figure out either... It's for drawing things into an existing image (I think?) but it doesn't seem to do anything other than show a spinny thing for awhile...
I didn't test the video tool because I don't have a midjourney account.
I think much like coding, the top of the game is all the old stuff and a bunch of new stuff that is impossible to master without some real math or at least outlier mathematical intuition.
The old top of the game is available to more people (though mid level people trying to level up now face a headwind in a further decoupling of easily read signals and true taste, making the old way of developing good taste harder).
This stuff makes people who were already "master rate" who are also nontrivially sophisticated machine learning hobbyists minimum and drives their peak and frontier out, drives break even collaboration overhead down.
It's always been possible to DIY code or graphic design, it's always been possible to tell the efforts of dabblers and pros apart, and unlike many commodities? There is rarely a "good enough". In software this is because compute is finite and getting more out of it pays huge, uneven returns, in graphic design its because extreme quality work is both aesthetically pleasing as well as a mark of quality (imperfect but a statement someone will commit resources).
And it's just hard to see it being different in any field. Lawyers? Opposing counsel has the best AI, your lawyer better have it too. Doctors? No amount of health is "enough" (in general).
I really think HN in particular but to some extent all CNBC-adjacent news (CEO OnlyFans stuff of all categories) completely misses the forest (the gap between intermediate and advanced just skyrocketed) for the trees (space-filling commodity knowledge work just plummeted in price).
But "commodity knowledge work" was always kind of an oxymoron, David Graeber called such work "bullshit jobs". You kinda need it to run a massive deficit in an over-the-hill neoliberal society, it's part of the " shift from production to consumption" shell game. But it's a very recent, very brief thing that's already looking more than wobbly. Outside of that? Apprentices, journeymen, masters is the model that built the world.
AI enables a new even more extreme form of mastery, blurs the line between journeyman and dabbler, and makes taking on apprentices a much longer-term investment (one of many reasons the PRC seems poised to enjoy a brief hegemony before demographics do in the Middle Kingdom for good, in China, all the GPUs run Opus, none run GPT-5 or LLaMA Behemoth).
The thing I really don't get is why CEOs are so excited about this and I really begin to suspect they haven't as a group thought it through (Zuckerberg maybe has, he's offering Tulloch a billion): the kind of CEO that manages a big pile of "bullshit jobs"?
AI can do most of their job today. Claude Opus 4.1? It sounds like if a mid-range CEO was exhaustively researched and gaff immune. Ditto career machine politicians. AI non practitioner prognosticators. That crowd.
But the top graphic communications people and CUDA kernel authors? Now they have to master ComfyUI or whatever and the color theory to get anything from it that stands out.
This is not a democratizing thing. And I cannot see it accruing to the Zuckerberg side of the labor/capital divvy up without a truly durable police state. Zuck offering my old chums nation state salaries is an extreme and likely transitory thing, but we know exactly how software professional economics work when it buckets as "sorcery" and "don't bother": that's 1950 to whenever we mark the start of the nepohacker Altman Era, call it 2015. In that world good hackers can do whatever they want, whenever they want, and the money guys grit their teeth. The non-sorcery bucket has paper mache hack-magnet hackathon projects in it at a fraction of the old price. So disruption, wow.
Whether that's good or bad is a value judgement I'll save for another blog post (thank you for attending my TED Talk).
Sure, now the client wants 130 edits without losing coherency with the original. What does a vibe designer do? Just keep re-prompting and re-generating until it works? Sounds hard to me.
Why would you compare it to Photoshop? If you compare it to other tools in the same category, of image generation, you will find models like Flux and Qwen do much better.
Is it because the model is not good enough at following the prompt, or because the prompt is unclear?
Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.
Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.
Before AI, people complained that Google was taking world class engineering talent and using it for little more than selling people ads.
But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.
Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?
Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.
Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.
Another nitpick - the pink puffer jacket that got edited into the picture is not the same as the one in the reference image - it's very similar but if I were to use this model for product placement, or cared about these sort of details, I'd definitely have issues with this.
Even in the just-photoshop-not-ai days product photos had become pretty unreliable as a means of understanding what you're buying. Of course it's much worse now.
Note: Please understand that monitor may color different. If image does not match product received then kindly your monitor calibration. Seller not responsible. /ebay&amazon
look at the bottom of the sleeves, they don't match.
the bottom of the jacket doesn't match either.
I didn't see it at first sight but it certainly is not the same jacket. If you use that as an advertisement, people can sue you for lying about the product.
I noticed the AI pattern on the sunglasses first. I guess all of the source images are AI-generated? In a sense, that makes the result slightly less impressive -- is it going to be as faithful to the original image when the input isn't already a highly likely output for an AI model? Were the input images generated with the same model that's being used to manipulate them?
It doesn't seem to matter: people have posted tons of examples on social media of non-AI base images that it was equally able to hold steady while making edits.
It seems like every combination of "nano banana" is registered as a domain with their own unique UI for image generation... are these all middle actors playing credit arbitrage using a popular model name?
I'd assume they are just fake, take your money and use a different model under the hood. Because they already existed before the public release. I doubt that their backend rolled the dice on LMArena until nano-banana popped up. And that was the only way to use it until today.
Agreed, I didn't mean to imply that they were even attempting to run the actual nano banana, even through LMarena.
There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.
Completely agree - I make logos for my github projects for fun, and the last time I tried SOTA image generation for logos, it was consistently ignoring instructions and not doing anything close to what i was asking for. Google's new release today did it near flawlessly, exactly how I wanted it, in a single prompt. A couple more prompts for tweaking (centering it, rotating it slightly) got it perfect. This is awesome.
Regardless, it seems Google is on the frontier of every type of model and robotics (cars). It’s nutty how we forget what a intellectual juggernaut they are.
I wonder how the creative workflow looks like when this kind of models are natively integrated into digital image tools. Imagine fine-grained controls on each layer and their composition with the semantic understanding on the full picture.
Before a model is announced, they use codenames on the arenas. If you look online, you can see people posting about new secret models and people trying to guess whose model it is.
“Nano banana” is probably good, given its score on the leaderboard, but the examples you show don't seem particularly impressive, it looks like what Flux Kontext or Qwen Image do well already.
I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.
I've tested it on Google AI Studio since it's available to me (which is just a few hours so take it with a grain of salt). The prompt comprehension is uncannily good.
My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.
Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.
Flux Kontext is an editing model, but the set of things it can do is incredibly limited. The style of prompting is very bare bones. Qwen (Alibaba) and SeedEdit (ByteDance) are a little better, but they themselves are nowhere near as smart as Gemini 2.5 Flash or gpt-image-1.
Gemini 2.5 Flash and gpt-image-1 are in a class of their own. Very powerful instructive image editing with the ability to understand multiple reference images.
> Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.
Both gpt-image-1 and Gemini 2.5 Flash feel like "Comfy UI in a prompt", but they're still nascent capabilities that get a lot wrong.
When we get a gpt-image-1 with Midjourney aesthetics, better adherence and latency, then we'll have our "GPT 4" moment. It's coming, but we're not there yet.
I'm confused as well, I thought gpt-image could already do most of these things, but I guess the key difference is that gpt-image is not good for single point edits. In terms of "wow" factor it doesn't feel as big as gpt 3->4 though, since it sure _felt_ like models could already do this.
Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111