Beyond seeing: ChatGPT now reasons visually
Yes, the former models of ChatGPT have been able to see and read from images, but this is different. This week, OpenAI introduced o3 and o4‑mini, new ChatGPT models that pause to think more deeply and can even “see” and work with images on their own.
o3 and o4‑mini don’t just “see” images, they think through them, transforming raw visuals into multi‑step insights such as converting a hand‑drawn flowchart into executable pseudocode, extracting key metrics from a sales chart to build forecasts, or turning a sketched floor plan into step‑by‑step navigation instructions.
- Deeper thinking across tasks: o3 tackles complex coding, math, science, and real‑world problems by reasoning longer before replying. In expert tests, it makes about 20 percent fewer major mistakes than earlier models on tough challenges.
- Think with images: Upload a photo of a whiteboard, diagram, or sketch, and the model factors that visual into its reasoning. It can rotate, zoom, or enhance blurry or reversed images to understand exactly what’s shown.
- Automatic tool use: Without extra instructions, these models decide when to search the web, run Python code, analyze files, or generate images—chaining all those steps into one seamless answer in under a minute.
- Speed and affordability: o4‑mini is a lean version tuned for faster replies and lower cost. It excelled in math, coding, and visual tasks, scoring 99.5 percent on the 2025 AIME when allowed to run Python, while supporting higher usage limits.
These models initiate a new trend online. By giving ChatGPT true “vision” and autonomy, o3 and o4‑mini can now turn any photo into a reverse location hunt. They’ll crop, rotate or zoom into subtle visual hints, building façades, street signs or skyline profiles, and then seamlessly trigger web searches to match those clues against online maps and landmark databases.
Prompt to try in ChatGPT with o3:
“Here’s a photo of our Q1 sales dashboard from the whiteboard. Can you extract the key trends, highlight any anomalies, and suggest 2 actions we should consider for Q2?”