
Why and How Gen AI Chose Visuals in the 0.1-Sec-Positioning-or-Scrolling Era
Why and How Gen AI Chose Visuals in the 0.1-Sec-Positioning-or-Scrolling Era
Advantages related to our cortex and tech maturity attracted bets on the human sense of instantaneity —the sight—, able to hook audiences in a tenth of a second
The question in the title seems too obvious, although not yet enough for many companies and professionals. I say “too obvious” referring to the inescapable central role of visuals in this era that we can call AImmersive or, in a more operational perspective, the 0.1-Sec-Positioning-or-Scrolling era.
“Inescapable centrality”? Well, what else, besides visuals, can position a brand on the screen in the 0.1 seconds (actually, 1.6) that a Missouri Univ eye-test detected as the window that a brand has online to generate a first impression and position itself and hook, or be scrolled?
Fact: The two areas that open or proprietary Gen AI platforms have chosen first are visuals and text. Let’s take a look at the evidence of this inescapability of visuals — which ranges from our brain physiology to the level of technological development and business — and what this demands to our brands, in a nerdbusiness dive. (This topic will still yield many articles here).
Our species is visual
Starting with the basics: visuals are decisive for our species. (I first gathered this evidence kit in the two initial blog articles — in fact, 80% of them: this article adds three more).
> Our eyes concentrate about 70% of all sensors in the body;
> 40% of the cortex is involved in processing/understanding visual information;
> 80% of all information reaches us through the eyes;
> We react first to visuals — and retain them; 93% of communication is non-verbal;
> Images go straight to long-term memory; Words go to the short-term memory, which only retains about 7 bits;
> And now, three more pieces of data: our brain deciphers image elements simultaneously; with text, you guessed it, it is word after word (and, I note, speed reading makes no difference: the sequentiality/linearity is maintained); the same goes for audio/sounds, also absorbed phoneme after phoneme, note after
note;
> Cognitively, visuals accelerate and increase understanding, recall, retention and decoding of text;
> Emotionally, they stimulate other areas of the brain, which generates deeper and more accurate understanding — and decisions.
Tech maturity: computer vision
The second reason for Gen AI to choose visuals is that Computer Vision (CV), according to experts, is one of the most mature areas of AI. Let’s understand the basics of it here. This helps to choose the best platform for each case, etc. (as Coca-Cola, Vogue, National Geo and others do; new article on this is coming) — and it is interesting even for non-nerds.
CV is a subfield of AI. It is the ability of machines to “see” and understand the visual world in a human-like way, in that quest for natural language. It uses complex algorithms, Machine Learning and, ta-da!, Convolutional Neural Networks, to identify objects, patterns and even emotions through facial expressions.
Convolutional Neural Networks (CNNs) replicate the structure of the animal visual cortex in layers of artificial neurons. These act together to extract features from images, such as edges, textures and shapes. They transform pixels into numbers, learn from training data and generate predictions. Thus:
1. Convolutional Layers: apply filters (each one detects a pattern, such as horizontal or vertical edges) to an input image to create feature maps;
2. Pooling Layers: reduce the dimensionality of these feature maps, discarding less relevant details. This reduces computational complexity and prevents overfitting;
3. Fully Connected Layers: connect all neurons from one layer to the next, which allows the network to combine the extracted features to make predictions or classifications
Business reasons: four top Gen AI
The sum of such radical evidence, such as the definitions of our species, and mature tools, generates a very, very profitable perspective. Let’s conclude this article by summarizing how these ideas convolved lol and, after all, elected the visuals (this compilation is from CoPilot; the Internal Debates of the platforms mentioned below are deductions from it (CoPilot); prompt, replicas and curation with checking are mine):
DALL E
> Timeline: OpenAI started in 2015;
> Notable contributors: Ilya Sutskever, Greg Brockman and Sam Altman;
> Investment: OpenAI has secured US$ 675M in Series B funding at a $2.6B valuation. Investors include Microsoft, OpenAI Startup Fund, NVIDIA, Bezos Expeditions, Parkway Venture Capital, Intel Capital, Align Ventures, and ARK Invest;
> Strategy: Focused on DALL-E, a generative model that creates unique images from textual prompts;
Internal debates likely revolved around the balance between artistic freedom, interpretability, and ethical considerations.
Midjourney
> Timeline: Midjourney emerged around 2020, gaining prominence in the AI art community;
> Key stakeholders: Midjourney’s founders, including artists and developers;
> Investment: Operating as an open-source project, it has had modest investment, with an emphasis on collaborative development;
> Annual recurring revenue (ARR): Reportedly reached US$ 200 M in its first year;
> Potential valuation: A 50x multiple on this ARR could result in a valuation of US$ 10 B or more;
> Strategy: Democratize AI art by introducing features such as consistent styles (V6) to maintain character continuity;
> Internal debates likely centered on usability, personalization, and accessibility.
Stable Diffusion
> Timeline: Stability AI is founded in 2019; amid turmoil, its flagship Stable Diffusion gains prominence;
> Key players: Researchers, engineers, and artists;
> Investments: Raised $101M in a round led by Coatue, Lightspeed, plus O’Shaughnessy and others, reaching a US$ 1 B post-money valuation;
> Strategy: Iteratively refine images through diffusion via its complex algorithmic framework;
> Internal debates likely revolved around balancing power, ease of use, and training models on diverse data.
Bing/CoPilot Designer (Image Creator)
> Timeline: CoPilot Designer evolved from the original Copilot AI, launched in 2021;
> Key players: Microsoft AI research teams, including engineers and designers;
> Investments: Undisclosed amounts;
> Strategy: increase productivity and creativity in all domains, but emphasizing visual content;
> Internal debates probably focused on seamless integration and maintaining clarity in the generated visuals.
Ok? What do you think about? Send us your comments!
At Immersera, based on my verbal-visual synergy (used to be Comms Head for three Gov Ministers and C.H. consultant for UN-Women), on my certs like on AI (USP and Exame), and on our network’s expertise, we provide consultancy/mentoring on 0.1-Sec-Positioning, positioned content, SEO, branding, websites, cases, and crisis management.