OpenAI Introduces Advanced Reasoning AI Models: o3 and o4-mini
OpenAI unveiled two new artificial intelligence models, o3 and o4-mini, on Wednesday, engineered with a novel capability to deliberate on queries before generating a response.
The company positions o3 as its most sophisticated reasoning engine to date, demonstrating superior performance compared to its predecessors across benchmarks evaluating mathematical ability, code generation, logical reasoning, scientific understanding, and visual comprehension. Concurrently, o4-mini is presented as offering an optimal balance between cost, speed, and intelligence – critical considerations for developers selecting AI foundations for their software.
Distinct from prior reasoning models, both o3 and o4-mini can leverage integrated tools within ChatGPT, such as performing web searches, executing Python code, processing images, and even generating visuals. Effective immediately, these models, along with a variant named “o4-mini-high” which dedicates more computational effort to enhance answer dependability, are accessible to subscribers of OpenAI’s Pro, Plus, and Team tiers.
These new releases signify OpenAI’s continued efforts to maintain a competitive edge against rivals like Google, Meta, xAI, Anthropic, and DeepSeek in the fiercely contested global AI arena. Although OpenAI pioneered the release of an AI reasoning model (o1), competitors swiftly developed alternatives that rival or surpass OpenAI’s offerings. Reasoning models have increasingly become a focal point as AI research labs strive to extract maximum performance from their architectures.
The deployment of o3 within ChatGPT was not initially certain. OpenAI CEO Sam Altman had suggested in February that the company might prioritize resources towards a different, sophisticated system incorporating o3’s underlying technology. However, competitive pressures appear to have prompted a shift in strategy, leading to the current release.
OpenAI reports that o3 achieves cutting-edge results on the SWE-bench verified benchmark (without specialized configurations), a standard test for programming skills, attaining a score of 69.1%. The o4-mini model shows comparable strength, scoring 68.1%. For comparison, OpenAI’s previous best, o3-mini, scored 49.3%, while Anthropic’s Claude 3.7 Sonnet achieved 62.3% on the same benchmark.
A significant advancement is that o3 and o4-mini are OpenAI’s inaugural models capable of incorporating visual information directly into their reasoning process, effectively “thinking with images.” Users can provide images, like whiteboard drawings or PDF diagrams, via ChatGPT, which the models analyze during their internal “chain-of-thought” processing before formulating an answer. This capability allows the models to interpret unclear or low-resolution images and perform operations like zooming or rotating imagery as part of their reasoning sequence.
Beyond visual intelligence, o3 and o4-mini can run Python code snippets directly within the user’s browser through ChatGPT’s Canvas feature and consult the web for information on current topics when prompted.
In addition to integration within ChatGPT, all three new models — o3, o4-mini, and o4-mini-high — will be accessible via OpenAI’s developer APIs, specifically the Chat Completions API and the Responses API. This allows engineers to incorporate these models into their own applications on a pay-per-use basis.
Considering its enhanced capabilities, OpenAI is offering o3 access to developers at a competitive rate: $10 per million input tokens (approximately 750,000 words) and $40 per million output tokens. For o4-mini, the pricing mirrors that of its predecessor, o3-mini, at $1.10 per million input tokens and $4.40 per million output tokens.
OpenAI has also announced plans to introduce o3-pro in the upcoming weeks. This version of o3 will utilize greater computational resources for generating responses and will be exclusively available to ChatGPT Pro subscribers.
CEO Sam Altman has hinted that o3 and o4-mini might represent the final distinct AI reasoning models released within ChatGPT before the anticipated arrival of GPT-5. The company has previously stated that GPT-5 aims to unify conventional large language models, like GPT-4.1, with its specialized reasoning engines into a single, comprehensive architecture.