Research

1,752 articles total

Research

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

researchSimon Willison's Weblog2d ago

<p><strong><a href="https://twitter.com/danveloper/status/2034353876753592372">Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally</a></strong></p> Here's a fascinating piece of research by Dan Woods, who managed to get a custom version of <a href="https://huggingface.co/Qwen/Qwen3.5-397B-A17B/tree/main">Qwen3.5-397B-A17B</a> running at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max despite that model taking up 209GB (120GB quantized) on disk.</p> <p>Qwen3.5-397B-A17B is a Mixture

GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52

researchSimon Willison's Weblog3d ago

<p>OpenAI today: <a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/">Introducing GPT‑5.4 mini and nano</a>. These models join GPT-5.4 which was released <a href="https://openai.com/index/introducing-gpt-5-4/">two weeks ago</a>.</p> <p>OpenAI's self-reported benchmarks show the new 5.4-nano out-performing their previous GPT-5 mini model when run at maximum reasoning effort. The new mini is also 2x faster than the previous mini.</p> <p>Here's how the pricing looks - all prices ar

Introducing GPT-5.4 mini and nano

researchOpenAI Blog4d ago

GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.

Introducing Mistral Small 4

researchSimon Willison's Weblog4d ago

<p><strong><a href="https://mistral.ai/news/mistral-small-4">Introducing Mistral Small 4</a></strong></p> Big new release from Mistral today (despite the name) - a new Apache 2 licensed 119B parameter (Mixture-of-Experts, 6B active) model which they describe like this:</p> <blockquote> <p>Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model.</p>

1M context is now generally available for Opus 4.6 and Sonnet 4.6

researchSimon Willison's Weblog3/13/2026

<p><strong><a href="https://claude.com/blog/1m-context-ga">1M context is now generally available for Opus 4.6 and Sonnet 4.6</a></strong></p> Here's what surprised me:</p> <blockquote> <p>Standard pricing now applies across the full 1M window for both models, with no long-context premium.</p> </blockquote> <p>OpenAI and Gemini both <a href="https://www.llm-prices.com/#sel=gemini-3-1-pro-preview-200k%2Cgpt-5.4-272k%2Cgemini-3-1-pro-preview%2Cgpt-5.4">charge more</a> for prompts where the token co

Gemini in Google Sheets just achieved state-of-the-art performance.

researchGoogle AI Blog3/10/2026

<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/Workspace_Jan_Moment_Sheets_Blo.max-600x600.format-webp.webp">Today we announced new beta features for Gemini in Sheets to help you create, organize and edit entire sheets, from basic tasks to complex data analysis — just describe …

GPT-5.4 Thinking System Card

researchOpenAI Blog3/5/2026

Reasoning models struggle to control their chains of thought, and that’s good

researchOpenAI Blog3/5/2026

OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.

Introducing GPT-5.4

researchOpenAI Blog3/5/2026

Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.

Gemini 3.1 Flash-Lite: Built for intelligence at scale

researchGoogle AI Blog3/3/2026

<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/gemini-3.1_flash_Lite_blog_keyw.max-600x600.format-webp.webp">Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.

GPT-5.3 Instant System Card

researchOpenAI Blog3/3/2026

Build with Nano Banana 2, our best image generation and editing model

researchGoogle AI Blog2/26/2026

<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/BuildWith_Hero.max-600x600.format-webp.webp">Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-level intelligence and fidelity for all image applications.

A new way to express yourself: Gemini can now create music

researchGoogle AI Blog2/18/2026

<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/0217_KeywordHeaderFinalc.max-600x600.format-webp.webp">Lyria 3 is now available in the Gemini app. Create custom, high-quality 30-second tracks from text and images.

GPT-5.2 derives a new result in theoretical physics

researchOpenAI Blog2/13/2026

A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.

Custom Kernels for All from Codex and Claude

researchHugging Face Blog2/13/2026

Introducing GPT-5.3-Codex-Spark

researchOpenAI Blog2/12/2026

Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.

GPT-5 lowers the cost of cell-free protein synthesis

researchOpenAI Blog2/5/2026

An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.

GPT-5.3-Codex System Card

researchOpenAI Blog2/5/2026

GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.

Introducing GPT-5.3-Codex

researchOpenAI Blog2/5/2026

GPT-5.3-Codex is a Codex-native agent that pairs frontier coding performance with general reasoning to support long-horizon, real-world technical work.

VfL Wolfsburg turns ChatGPT into a club-wide capability

researchOpenAI Blog2/4/2026

By focusing on people, not pilots, the Bundesliga club is scaling efficiency, creativity, and knowledge—without losing its football identity.

The Sora feed philosophy

researchOpenAI Blog2/3/2026

Discover the Sora feed philosophy—built to spark creativity, foster connections, and keep experiences safe with personalized recommendations, parental controls, and strong guardrails.

Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT

researchOpenAI Blog1/29/2026

On February 13, 2026, alongside the previously announced retirement⁠ of GPT‑5 (Instant, Thinking, and Pro), we will retire GPT‑4o, GPT‑4.1, GPT‑4.1 mini, and OpenAI o4-mini from ChatGPT. In the API, there are no changes at this time.

Inside Praktika's conversational approach to language learning

researchOpenAI Blog1/22/2026

How Praktika uses GPT-4.1 and GPT-5.2 to build adaptive AI tutors that personalize lessons, track progress, and help learners achieve real-world language fluency

Inside GPT-5 for Work: How Businesses Use GPT-5

researchOpenAI Blog1/22/2026

A data-driven report on how workers across industries use ChatGPT—covering adoption trends, top tasks, departmental patterns, and the future of AI at work.

How Higgsfield turns simple ideas into cinematic social videos

researchOpenAI Blog1/21/2026

Discover how Higgsfield gives creators cinematic, social-first video output from simple inputs using OpenAI GPT-4.1, GPT-5, and Sora 2.

How countries can end the capability overhang

researchOpenAI Blog1/21/2026

Our latest report reveals stark differences in advanced AI adoption across countries and outlines new initiatives to help nations capture productivity gains from AI.

How Tolan builds voice-first AI with GPT-5.1

researchOpenAI Blog1/7/2026

Tolan built a voice-first AI companion with GPT-5.1, combining low-latency responses, real-time context reconstruction, and memory-driven personalities for natural conversations.

Introducing GPT-5.2-Codex

researchOpenAI Blog12/18/2025

GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.

Introducing GPT-5.2-Codex

researchOpenAI Blog12/18/2025

GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.

Measuring AI’s capability to accelerate biological research

researchOpenAI Blog12/16/2025

OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.

How We Used Codex to Ship Sora for Android in 28 Days

researchOpenAI Blog12/12/2025

OpenAI shipped Sora for Android in 28 days using Codex. AI-assisted planning, translation, and parallel coding workflows helped a nimble team deliver rapid, reliable development.

New in llama.cpp: Model Management

researchHugging Face Blog12/11/2025

Advancing science and math with GPT-5.2

researchOpenAI Blog12/11/2025

GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.

Introducing GPT-5.2

researchOpenAI Blog12/11/2025

GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT and the OpenAI API to power faster, more reliable agentic workflows.

Update to GPT-5 System Card: GPT-5.2

researchOpenAI Blog12/11/2025

GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet, information that we partner with third parties to access, and information that our users or human trainers and researchers provide or generate.

GPT-5 and the future of mathematical discovery

researchOpenAI Blog11/24/2025

UCLA Professor Ernest Ryu and GPT-5 solved a key question in optimization theory, showcasing AI’s role in accelerating mathematical discovery.

Early experiments in accelerating science with GPT-5

researchOpenAI Blog11/20/2025

OpenAI introduces the first research cases showing how GPT-5 accelerates scientific progress across math, physics, biology, and computer science. Explore how AI and researchers collaborate to generate proofs, uncover new insights, and reshape the pace of discovery.

Building more with GPT-5.1-Codex-Max

researchOpenAI Blog11/19/2025

Introducing GPT-5.1-Codex-Max, a faster, more intelligent agentic coding model for Codex. The model is designed for long-running, project-scale work with enhanced reasoning and token efficiency.

Understanding neural networks through sparse circuits

researchOpenAI Blog11/13/2025

OpenAI is exploring mechanistic interpretability to understand how neural networks reason. Our new sparse model approach could make AI systems more transparent and support safer, more reliable behavior.

Introducing GPT-5.1 for developers

researchOpenAI Blog11/13/2025

GPT-5.1 is now available in the API, bringing faster adaptive reasoning, extended prompt caching, improved coding performance, and new apply_patch and shell tools.

GPT-5.1: A smarter, more conversational ChatGPT

researchOpenAI Blog11/12/2025

We’re upgrading the GPT-5 series with warmer, more capable models and new ways to customize ChatGPT’s tone and style. GPT-5.1 starts rolling out today to paid users.

Introducing IndQA

researchOpenAI Blog11/3/2025

OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.

Addendum to GPT-5 System Card: Sensitive conversations

researchOpenAI Blog10/27/2025

This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.

With GPT-5, Wrtn builds lifestyle AI for millions in Korea

researchOpenAI Blog10/2/2025

Wrtn scaled AI apps to 6.5M users in Korea with GPT-5, creating ‘Lifestyle AI’ that blends productivity, creativity, and learning—now expanding across East Asia.

SOTA OCR with Core ML and dots.ocr

researchHugging Face Blog10/2/2025

Sora 2 is here

researchOpenAI Blog9/30/2025

Our latest video generation model is more physically accurate, realistic, and controllable than prior systems. It also features synchronized dialogue and sound effects. Create with it in the new Sora app.

Launching Sora responsibly

researchOpenAI Blog9/30/2025

To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.

Sora 2 System Card

researchOpenAI Blog9/30/2025

Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.

Creating a safe, observable AI infrastructure for 1 million classrooms

researchOpenAI Blog9/22/2025

Discover how SchoolAI, built on OpenAI’s GPT-4.1, image generation, and TTS, powers safe, teacher-guided AI tools for 1 million classrooms worldwide—boosting engagement, oversight, and personalized learning.

GPT-5 bio bug bounty call

researchOpenAI Blog9/5/2025

OpenAI invites researchers to its Bio Bug Bounty. Test GPT-5’s safety with a universal jailbreak prompt and win up to $25,000.

Introducing GPT-5 for developers

researchOpenAI Blog8/7/2025

Introducing GPT-5 in our API platform—offering high reasoning performance, new controls for devs, and best-in-class results on real coding tasks.

Coding and design with GPT-5

researchOpenAI Blog8/7/2025

Learn how GPT-5 unlocks new possibilities in coding and design.

Creative writing with GPT-5

researchOpenAI Blog8/7/2025

Learn how GPT-5 assists with creative writing.

Medical research with GPT-5

researchOpenAI Blog8/7/2025

Learn how GPT-5 is used for medical research.

First look at GPT-5

researchOpenAI Blog8/7/2025

See how a group of leading developers use GPT-5 for the first time.

Introducing GPT-5

researchOpenAI Blog8/7/2025

We are introducing GPT‑5, our best AI system yet. GPT‑5 is a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more.

GPT-5 System Card

researchOpenAI Blog8/7/2025

This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.

How Amgen uses GPT-5

researchOpenAI Blog8/7/2025

Learn how Amgen uses GPT-5.

Ettin Suite: SoTA Paired Encoders and Decoders

researchHugging Face Blog7/16/2025

Efficient MultiModal Data Pipeline

researchHugging Face Blog7/8/2025

Shipping code faster with o3, o4-mini, and GPT-4.1

researchOpenAI Blog5/22/2025

CodeRabbit uses OpenAI models to revolutionize code reviews—boosting accuracy, accelerating PR merges, and helping developers ship faster with fewer bugs and higher ROI.

Vision Language Models (Better, faster, stronger)

researchHugging Face Blog5/12/2025

The 4 Things Qwen-3’s Chat Template Teaches Us

researchHugging Face Blog4/30/2025

Sycophancy in GPT-4o: what happened and what we’re doing about it

researchOpenAI Blog4/29/2025

We have rolled back last week’s GPT‑4o update in ChatGPT so people are now using an earlier version with more balanced behavior. The update we removed was overly flattering or agreeable—often described as sycophantic.

Thinking with images

researchOpenAI Blog4/16/2025

OpenAI o3 and o4-mini represent a significant breakthrough in visual perception by reasoning with images in their chain of thought.

OpenAI o3 and o4-mini System Card

researchOpenAI Blog4/16/2025

OpenAI o3 and OpenAI o4-mini combine state-of-the-art reasoning with full tool capabilities—web browsing, Python, image and file analysis, image generation, canvas, automations, file search, and memory.

Introducing 4o Image Generation

researchOpenAI Blog3/25/2025

At OpenAI, we have long believed image generation should be a primary capability of our language models. That’s why we’ve built our most advanced image generator yet into GPT‑4o. The result—image generation that is not only beautiful, but useful.

Addendum to GPT-4o System Card: 4o image generation

researchOpenAI Blog3/25/2025

4o image generation is a new, significantly more capable image generation approach than our earlier DALL·E 3 series of models. It can create photorealistic output. It can take images as inputs and transform them.

Detecting misbehavior in frontier reasoning models

researchOpenAI Blog3/10/2025

Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor their chains-of-thought. Penalizing their “bad thoughts” doesn’t stop the majority of misbehavior—it makes them hide their intent.

OpenAI GPT-4.5 System Card

researchOpenAI Blog2/27/2025

We’re releasing a research preview of OpenAI GPT‑4.5, our largest and most knowledgeable model yet.

Introducing GPT-4.5

researchOpenAI Blog2/27/2025

We’re releasing a research preview of GPT‑4.5—our largest and best model for chat yet. GPT‑4.5 is a step forward in scaling up pre-training and post-training.

Introducing the SWE-Lancer benchmark

researchOpenAI Blog2/18/2025

Can frontier LLMs earn $1 million from real-world freelance software engineering?

Build awesome datasets for video generation

researchHugging Face Blog2/12/2025

Strengthening America’s AI leadership with the U.S. National Laboratories

researchOpenAI Blog1/30/2025

OpenAI’s latest line of reasoning models will be used by nation’s leading scientists to drive scientific breakthroughs.

State of open video generation models in Diffusers

researchHugging Face Blog1/27/2025

Boosting the customer retail experience with GPT-4o mini

researchOpenAI Blog12/11/2024

Zalando boosts the customer experience with its Assistant, powered by GPT-4o mini

Sora is here

researchOpenAI Blog12/9/2024

Our video generation model, Sora, is now available to use at sora.com. Users can generate videos up to 1080p resolution, up to 20 sec long, and in widescreen, vertical or square aspect ratios. You can bring your own assets to extend, remix, and blend, or generate entirely new content from text.

Sora System Card

researchOpenAI Blog12/9/2024

Sora is OpenAI’s video generation model, designed to take text, image, and video inputs and generate a new video as an output. Sora builds on learnings from DALL-E and GPT models, and is designed to give people expanded tools for storytelling and creative expression.

Vallée Duhamel & Sora

researchOpenAI Blog12/9/2024

Filmmaking duo Vallée Duhamel explains how Sora helps build new worlds.

Minne Atairu & Sora

researchOpenAI Blog12/9/2024

Interdisciplinary artist Minne Atairu discusses how Sora helps realize her vision.

Animator Lyndon Barrois creates new worlds with Sora

researchOpenAI Blog12/9/2024

Filmmaker Lyndon Barrois describes how to use Sora as a storytelling tool.