Was this newsletter forwarded to you? Sign up to get it in your inbox.
If you’ve been following AI news this week, you may feel like a kid at Christmas—and like filing a petition for OpenAI to hire a model namer. With o3, GPT‑4.1, and o4‑mini all dropping at once, even AI-savvy teams are asking: Wait, which one are we supposed to use?
We’ve spent the last few days running tests, switching between models, breaking a few prompts, and seeing what sticks. Here’s the gist:
- o3 is OpenAI’s most deliberate thinker and newest flagship model: Built for self-directed complex reasoning and tool use.
- GPT‑4.1 is a structured, API-only workhorse built for developers: Great at tight instruction following and long context memory.
- o4-mini is the efficiency engine: Fast, affordable, and remarkably strong at math, visual reasoning, and cost-sensitive development work. It won’t steal the spotlight—it’s not OpenAI’s flagship model or the benchmark champ. But its efficacy means it might quietly run half your stack.
Let’s dive into what’s new, what each model does, and what the team at Every thinks after trying them out on our workflows.
o3: OpenAI’s most powerful reasoning model
o3 is the first model Every CEO Dan Shipper has been this excited about since GPT‑4, which came out in 2023. It doesn’t just use tools, like GPT-4o, or see images—it thinks with them.
What it’s great at:
- Tool use: o3 knows how to use tools, how to string different tools together, and how to pivot. Say you upload a chart of monthly sales. It might extract the data using optical character recognition (OCR), write Python to calculate your year-over-year growth, and search for industry benchmarks to contextualize the results—all in one go. It can make up to 600 tool calls in a single response, self-improve along the way, and pivot if something breaks. It’s your self-directed analyst with a Swiss Army knife—and the judgment to know which blade to use.
- Visual reasoning: It interrogates images with real context. While other models might say, “This is a painting of a woman,” o3 zooms in on the corner, reads the artist’s signature, searches for the museum in which it hangs, and gives you the history of the art movement it’s from.
GPT-4.1: Built for precision, not vibes
4.1 is currently available only to developers through the API, and it’s designed to follow detailed instructions with ruthless precision. It’s less dreamy than predecessors like 4.5, but more structured, reliable, and consistent. Think of it as OpenAI’s workhorse for targeted developer tasks, not creative exploration.
What it’s great at:
- Follows complex instructions: GPT-4.1 handles instructions like a seasoned navigator. Say you’re coding a recipe maker. In a single prompt, you might ask to format the response in Markdown, avoid certain topics, output cooking steps in a particular order, and always include a key metric like sodium content. Where past models might fumble or skip steps, 4.1 sticks to your map—even when the path is long, winding, and filled with tricky turns.
- It won’t lose your map: With a memory increase to 1 million tokens instead of the 128,000 in older models, you can set the tone or structure once, and it’ll follow through across multiple replies. You don’t need to start from scratch every time.
- Thrives on structure: GPT-4.1 is like that friend on a road trip who’s fun to have around—as long as there’s a plan. Give it a clear itinerary, and it executes with clarity and precision. But hand it a “just vibes” prompt like, “Can you make this recipe app feel more like stepping into a cozy speakeasy?” and it might want to go home. The clearer the map, the smoother the ride.
o4-mini: Small, sharp, and surprisingly capable
o4-mini is the latest addition to OpenAI’s “o-series,” its line of reasoning models that are trained to think longer before responding. It’s optimized for both quantity and quality (with a daily cap for consumers of 150 messages opposed to o3’s weekly cap of 50), offering near o3-level performance—especially in math, coding, and visual-heavy tasks—faster and at a fraction of the cost. While o3 is OpenAI’s most powerful reasoning model, o4-mini is your go-to when you want most of o3’s smarts for a bill nine times cheaper. That’s not a mini difference.
Source: o4-mini/Vivian Meng.What it’s great at:
- Packs a punch for its size: Need to analyze tons of transcript data or summarize messy research tables? o4-mini handles high-volume requests like a pro—filtering for insights, writing structured query language, searching for data, and plotting results on an interactable graph. Where o3 might fire off a dozen reasoning steps (and rack up the token bill), o4-mini cuts to the chase with a clean, usable answer that’s still well-reasoned.
- Same tools, lighter lift: o4-mini gives you o3’s complete toolkit, including Python, web browsing, image analysis and generation, and more. It’s especially handy for tasks like generating a weekly analytics summary: fetching a CSV, running Python to clean and chart the data, searching the web for bird’s-eye-view industry data to contextualize, and producing a markdown report. It does this all in one go, and without o3’s extra compute overhead.
What everyone at Every thinks…
… about o3
o3 thinks like a prompt engineer
“o3 has been a great companion for working on AI stuff. It seems to know a lot about how LLMs work and the different tools and techniques that are out there right now. Other models tend to respond with traditional natural language processing techniques—o3 responds with techniques you'd actually use with LLMs.”—Danny Aziz, general manager of Spiral
o3 is the best teacher model yet
The Only Subscription
You Need to
Stay at the
Edge of AI
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators
Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools



Comments
Don't have an account? Sign up!
You are correct about o4 mini and visual reasoning. I was feeing o3 questions from
British tv show the 1%. The only one it struggled with was a rebus with a 8 in a box - incubate. It took it 3 goes and it used several minutes each time. I just tried it on o4 mini and it got it correct in 4 seconds (unless it can leverage memory of my chats 🤔). You have a typo on gpt 3.5 - should be Nov 22.
Thanks for sharing this thoughtfully written piece.
@gundamexpressaus Thank you for reading!
Love the structure of the article and that's it's based on your own experience. Overview with the right level of depth. I wonder if a follow up after a few months would be good as some things will emerge as we all learn to use the models better. Thoughts?