Curiosity Phase of Generative AI, Incumbent vs New Pro Users & more

also Microsoft has entered the chat, make AI-generated music & unlimited product photos, a prompt diagramming tool, tech stack for ML founders ...

Oct 25, 2022

Welcome! CRW RIDE (🔊curve-ride) is your navigator to the world of Generative AI. There’s so much happening out there; I make sense of the chaos 🧙‍♂️.

Whether you’re an Artist, Engineer, Founder, Investor, Content Creator or an ML Enthusiast- there’s something for everyone! The latest Research Papers, Tutorials, Open-source projects, Prompt tips, Industry Stats, Insights, Art Inspiration,Memes. Focussing on insights over news.

Alright, let’s begin this week’s ride!

Btw, the 1st issue went out to ~10 subs. Now we’re at ~80 subs! Thanks everyone!

💡PRO TIP: Click on the title above and read this in your browser as these long emails get cut off in mail apps. Also, there’s a Substack mobile app.

🍵 Appetizer

🛠 Your ML Tech Stack: Building in the Generative AI space? Bookmark this :)

Erik Dunteman @erikdoingthings

If I were an ML founder starting today, here's my stack: MVP: - Backend: ML APIs @OpenAI @CohereAI - Frontend: Next.js on @vercel @Zeet_Co @Railway Beyond: - Finding Code: @huggingface - Training: @GoogleColab @roboflow - GPU Dev Envs: @brevdev - Backend: @BananaDev_

💰 Stability.ai raised $101M. Their model training bill? $50M!

One crucial detail about them. It might seem like Stability.ai is the inventor of Stable Diffusion, it’s not. Researchers at RunwayML, LMU Munich released prior work. Stability.ai provided compute to researchers to retrain or something like that. As it usually happens, most folks won’t ever know these details. You do 😎. (my source)

Stanislas Polu @spolu

We are living the age of the « Big Bootstrap ». Like you bootstrap a programming language, generative models are being bootstrapped from accumulated human creativity and moving forward it’ll be forever unclear who created what.

Basically, it’s a collaborative effort. There’s no single inventor. This is what a true open-source, hacker culture looks like! Like I remember using some generative models (VQGAN+CLIP etc) in August of last year, the makers of those models now work at Midjourney, StabilityAI etc. Also, there are many other models too.

shashank @shacrw_

StabilityAI raised $101M. Its cloud bill exceeds $50M. This reminds me of the pre-2020 stat: 40-50% of all VC money going to FB + Google Ads. For some Generative AI startups, that % goes to GPU providers, LLM APIs etc ?

Matthew Lynley @mattlynley

Stability AI has confirmed its funding round at a $1B valuation. Some notes from our sources: its cloud bill for Stable Diffusion training etc exceeds $50M and it is looking at a "holding company"-like model. https://t.co/vlwr9EjxXR

btw, I didn’t publish anything last week. More on that later.

🍱 Main Course

(TIL: there’s a bento box emoji)

Incumbent vs New Pro Users

Before talking about Pro users, let’s understand the Curiosity Phase of a new technology.

TLDR: new tech becomes widely available → wide base of users can try it out now → but value prop + core product primitives not obvious → tons of opportunities, room for experiments → lots of hype around use cases & value creation.

i.e dangerous time for those building long-lasting products. 👇

shashank @shacrw_

SOURCE OF FALSE SIGNALS (in this phase): 🔹 Technical Inflection Point invites new market participants → "wow, so much TAM! → increased competition 🔹 low value, long-tail of users have more free time → willing to play with your toy. PRO users experiment, switch less often

shashank @shacrw_

OPTIMIZING GTM & CHOOSING YOUR CUSTOMER ◾ important to understand your customer profile ◾ everytime you breathe, a new AI model get released. so, it might feel like you're always trying to catch-up so don't cave in to endless feature requests; focus on long-term value

If you’re on twitter these days you have likely seen a wave of videos that utilize things like Stable Diffusion for fun and novel product concepts/demos …This reminds me a lot of prior concept-heavy phases of AR and VR … fun the first time you watched them and lost luster over the subsequent viewings as we all litigated how much we really would use a given use-case…

- Michael Dempsey

Thus, it’s important to understand your customer profile. For AI-first companies, targeting a specific type (Incumbent or New) of “professional user” is one way to think about this.

The Incumbent Professional users will see the existing “professional” workflows upended by AI due to:

An efficiency increase
A quality of work increase
A collapse of features from disparate products into a single product
A previously impossible feature that is important enough to expand budget or move budget towards the AI-first product

The New Professional users will become paying customers and benefit from:

The democratization of a skillset leading to a specific job of their industry’s stack being made obsolete from AI
AI enabling a new worker-type, leading to the worker doing a job despite having different qualifications from the prior person who did the job without AI
A lowered barrier of entry to a part of the industry that brings consumers or prosumers to professional level or removes the middleman

shashank @shacrw_

If "Incumbent PROs" are your FOCUS, don't get SOTA anxiety. focus on owning high value users which'll lead to $$$ → scale team → build flywheels for moats For "New PROs": prioritize roadmap to ship MVPs of high-usage features that target one of the bullet profiles above but 👇

I am gonna explain these through some examples below.

(Credits: This section was inspired by a brilliant post by Michael Dempsey. One of the most impactful blogposts I’ve read recently. I’ve quoted some lines verbatim.)

Unlimited Product Photography 📸

First impressions matter a lot. For D2C companies, great product photos are important however that process can be expensive. You’d have to hire editors, photographers or use some creative agency which does this work. But what if you could just click images from your phone and get these amazing photos!

Russ Maschmeyer @StrangeNative

AI unlocks unlimited product photography. 🤯 Recent advances have made it possible for merchants to turn mobile photos into high quality product images + ad creative. AI can now learn and reproduce a product on demand. More…👇[1/10] #ai #aiart #stablediffusion #dreambooth

How were these custom photos generated? 👇

Russ Maschmeyer @StrangeNative

All the nerdy details… We used #dreambooth to train models for each product. 200-250 regularization images from the same product category, 20-25 training images of a specific product, 4000 iterations (~1–1.5hrs on an A100). ✌️🙏[10/10]

We covered Dreambooth last week. This is a good real world use case for it.

The above is an example of a use case targeting new professional users. How? Remember the 3rd bullet point under new pro users section above? 👇

A lowered barrier of entry to a part of the industry that brings consumers or prosumers to professional level or removes the middleman

In this case, marketers/designers can create product photos themselves. No professional photography, editing needed!

Btw, this concept of new vs incumbent user isn’t exclusive to AI. Once you get it, you’ll start seeing it everywhere! eg: nocode website builder Webflow enabled marketers, designers to own the landing page. less to-and-fro with devs. new PRO users!

Reading tons of Research Papers? Save time using AI

What if while reading a paper you could just highlight long paras and get a concise explanation? Explainpaper made by Aman & Jade can do this!

Or say you want to save even more time! Elicit uses LLMs for automating research workflows. Here’s their latest feature.

Andreas Stuhlmüller @stuhlmueller

New beta feature in @elicitorg: Synthesize the top papers into a summary answer. Updates when you remove irrelevant papers

Now if you refer back to the section on Pro users, I think this satisfies both categories:

Incumbent PRO: Better paper reading results in increase in efficiency, quality of work for Researchers who were already reading papers but they’ll see their existing “professional” workflows upended by AI ( Incumbent Pro Users).

New PRO: However, since these tools might be able to explain papers in simple words, they’ll also be useful to non-researcher demographics interested in learning about a topic from Research Papers directly. eg: health conscious folks wanting to read about the benefit of a certain health intervention or journalists who want to understand a topic better (instead of just talking to 5-10 experts, they could just synthesize answers from the Top 50-100 papers).

Using Stable Diffusion (SD) to generate synthetic data for STEM use cases by finetuning

Although we see SD as a way to generate these artistic images, it’s basically an image generator model. But there are many scientific applications where image datasets are needed. eg: Radiology!

Use of ML for Radiology has progressed a ton in the past 5 years but this is something out of the world! Finetuning Stable Diffusion to generate medical images. Great for increasing the diversity of your dataset! Which means better models.

I was gonna make a guess (whether this is for Incumbent or New Pro users) but I don’t really know a lot about medical data labelling, ML workflows 🤷‍♂️.

Christian Bluethgen @cxbln

🎉 #StableDiffusion can be fine-tuned to generate medical images, and the outputs can be controlled using natural language text prompts! In our latest work, we use SD to create synthetic chest xrays and insert pathologies like pleural effusions. 🧵 #Radiology #AI #StanfordAIMI

Original and refined synthetic CXR for the prompt "A photo of a lung xray with a visible pleural effusion".

I think we’re gonna see Stable Diffusion finetuning being applied to more Computer Vision tasks soon. One I am particularly excited for: Geospatial Imagery (satellite data). Why? Story Time:

2 years ago, I was working on a geospatial project. It was something to do with roof detection in satellite images to determine best sites for solar panels (in Singapore). The satellite image dataset was small & blurry so I took up the task of converting low res into high res images. Now we were under certain time & resource contraints. To increase the dataset size, I gathered images of 3 other cities. But those cities looked very different from Singapore. I am leaving out a lot of detail but TLDR: I didn’t get satisfactory results.

If only I had Stable Diffusion back then…

A Twitter Discussion b/w Prompt Magicians

This is a snippet of a twitter discussion b/w some AI artists. Although making impressive stuff is quite easy now with all these tools, the Top 10% artists do a ton of tweaking to make their work stand apart. Art will always grow, AI models are just infinitely better multi-dimensional assistants/tools.

Generative art tools help make incumbent users (Artists) more productive but they also result in the birth of new artists!

If you want to improve your AI art, follow these discussions on twitter. I use a private Twitter List of Generative AI artists to follow these. I’ll be making that list public soon.

What’s New in Research 🧪

text2motion: MotionDiffuse, the first diffusion model-based text-driven motion generation framework. wondering what the output would be for “person sending out a newsletter 1 week late” 👻

text2light: generate panoramic scenes from text. more examples in the link.

Generate images using fewer sampling steps: Right now, 28-50 steps are used for image creation, this new research can generate high quality samples in just 1-4 steps. Massive improvement! I think once this gets productized, it’ll unleash a ton of realtime applications. (link)

Progress in improving reasoning ability of LLMs 👇

Ruibo Liu @RuiboLiu

Simulation is All You Need for Grounded Reasoning!🔥 Mind's Eye enables LLM to *do experiments*🔬 and then *reason* over the observations🧑‍🔬, which is how we humans explore the unknown for decades.🧑‍🦯🚶🏌 Work done @GoogleAI Brain Team this summer!

Aran Komatsuzaki @arankomatsuzaki

Mind's Eye: Grounded Language Model Reasoning through Simulation Improves reasoning ability using MuJoCo simulations by a large margin (+27.9/46.0% zero/few-shot absolute acc. on average). LMs + Mind's Eye performs on par with 100x larger models. https://t.co/FoLFFvQByc https://t.co/TCrt17q4nZ

Shunyu Yao @ShunyuYao12

Large Language Models (LLM) are 🔥in 2 ways: 1.🧠Reason via internal thoughts (explain jokes, math reasoning..) 2.💪Act in external worlds (SayCan, ADEPT ACT-1, WebGPT..) But so far 🧠and💪 remain distinct methods/tasks... Why not 🧠+💪? In our new work ReAct, we show 1+1>>2!

AK @_akhaliq

ReAct: Synergizing Reasoning and Acting in Language Models abs: https://t.co/DbcLZyXJIt https://t.co/RHNmcLTsYX

🍨 Time for Dessart

Small Request: If you click on these tweets, do come back, there’s more 🤓! There’s also a poll 📊.

Sergei Galkin @sergeyglkn

Everything is changing. Playing with combinations of #AI and #AR again. This time I am using the Target tracker in SparkAR, Stable Diffusion for generating images and Frame Interpolation ML for making animation.

ScottieFox @ScottieFoxTTV

Stable Diffusion VR Real-time immersive latent space. 🔥 Small clips are sent from the engine to be diffused. Once ready, they're queued back into the projection. Tools used: deforum.github.io derivative.ca #aiart #vr #stablediffusionart #touchdesigner #deforum

Dalle2 Pics @Dalle2Pictures

I trained an AI to (accurately) paint modern day rappers in the style of Renaissance artists using DreamBooth. (not #dalle2) (1/?) “A Renaissance portrait painting of Drake by Raffaello Sanzio, masterpiece” in #stablediffusion

Make AI Music: what good is seeing all this art, if you don’t create some of your own, right? try this youtube tutorial. it’s pretty simple. If you’ve used Colab notebooks before, you can directly go to that (colab link).

Why try now? because this is in beta right now, so it’s free. (made by Mubert)

This performs fine for examples shown in the video but for advanced prompts you might not get good results. BUT text2music is just picking up, by Dec-Jan, this will be 5-10 times better.

AI-generated music clip for “car driving into sunset, synthwave”

1×

0:00

-0:20

🍬 Candies

👌 How do Diffusion Models work (explainer for both tech/non-tech folks)

👌 The Open Source movement is eating AI (thread)

👌 State of AI Report 2022: good summary of AI in 2022, not just Generative AI.

Dubverse.ai : generate English subtitles for videos in any language. built using OpenAI Whisper.

AI-generated podcast b/w Joe Rogan & Steve Jobs.

DreamSpace: a prompt diagramming tool, generate hundreds of images, manage prompt dependencies, try variations, find nearby concepts, and fine-tune params in real-time.

thought2text project by Samarth

awesome-diffusion-models: github repo of resources on diffusion models

Microsoft 🤝 DALL-E :

Create designs using DALLE : It’s basically a cheaper version of Canva🙃. Vertical Integration is the name of the game. btw, Canva has a Stable Diffusion integration too. I LOVE Canva. Fun Fact: CRW RIDE’s logo (temporary) and my twitter dp were also made/edited there.
Bing integration: Suppose you search for an image, don’t find one, then just create it! This was pretty obvious but I expected Google to integrate Imagen first, totally forgot that MSFT and OpenAI have a partnership. OpenAI is also looking to raise more $$$ from MSFT.

And now an update on CRW RIDE. scroll to the tweets section if you’re not interested.

The Main Course was text heavy today. The 1st issue was mostly news based. But as you saw above, this one had a central theme (Curiosity Phase, Pro vs New Users) so it took a bit of time. I was also busy with some other stuff last week.

I am going to prioritize actionable Insights»News.

There’s so much news out there that it’s impractical to keep up with all of it. Will be adding a new section to CRW RIDE (substack allows writers to make multiple newsletter sections) which’ll be optional.

Still debating b/w 2 ideas but I want to do a section dedicated to Builders, Artists so that would have more technical, prompt engineering related stuff while this one would be insight+news based. wdyt?

If you have any feedback / suggestions, feel free to reply to this mail, in the comment section or via Twitter (DMs open).

btw, if anyone is aware of some grant etc which supports work like CRW RIDE, lemme know 🙏.

👌 Tweets

Merzmensch Kosmopol @Merzmensch

Exploring creative AI, I have a feeling to discover hidden topoi and themes of subcultures, hidden deep between words and images. AI can not only be predictive analytics. It can detect cultural anomalies we humans cannot see with naked eye.

shashank @shacrw_

"The next big thing will start out looking like a toy" - @cdixon True. But ALL web demos (i.e. toys) are NOT the NEXT big thing. LLMs are getting optimized & productized at such a fast rate that something which is scarce/novel last week, gets commoditized this week 🤷‍♂️.

Will Manidis @WillManidis

I've never seen VCs make a bigger mistake than generative AI. They are funding many projects that are destined to be abject failures. Even worse, they are totally missing the places where ML will actually have impact and generate enterprise value. Let me explain:

David Chalmers @davidchalmers42

so, what's the best reason to think large language models are not sentient? more precisely: what's the best candidate for X such that LLMs clearly lack X and X is required for sentience?

Benedict Evans @benedictevans

Automation: things that are hard for people to do but easy to describe to machines or computers ML: things that were easy to people to do but hard to describe to computers Generative AI: things that are easy for people to imagine, but not necessarily easy to describe to computers

Suhail @Suhail

Expert level prompt engineering going on here: https://t.co/ZnKgLUN8NZ

Thanks for reading till the end! You’ve got top notch attention span💯🙌. If you liked what you read, do share this newsletter with your friends and on your socials :)

Latent Garage

Discussion about this post