AI in Production

1,752 articles total

AI in Production

WordPress.com now lets AI agents write and publish posts, and more

productionTechCrunch AI22h ago

New AI agents on WordPress.com could lower barriers to publishing while increasing machine-generated content across the web.

How we monitor internal coding agents for misalignment

productionOpenAI Blog2d ago

How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards.

Meta is having trouble with rogue AI agents

productionTechCrunch AI2d ago

A rogue AI agent inadvertently exposed Meta company and user data to engineers who didn't have permission to see it.

Nothing CEO Carl Pei says smartphone apps will disappear as AI agents take their place

productionTechCrunch AI2d ago

Nothing CEO Carl Pei says AI agents will eventually replace apps, shifting smartphones toward systems that understand intent and act on a user's behalf.

Holotron-12B - High Throughput Computer Use Agent

productionHugging Face Blog4d ago

Subagents

productionSimon Willison's Weblog4d ago

<p><em><a href="https://simonwillison.net/guides/agentic-engineering-patterns/">Agentic Engineering Patterns</a> ></em></p> <p>LLMs are restricted by their <strong>context limit</strong> - how many tokens they can fit in their working memory at any given time. These values have not increased much over the past two years even as the LLMs themselves have seen dramatic improvements in their abilities - they generally top out at around 1,000,000, and benchmarks frequently report better quality resul

OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first

productionOpenAI Blog4d ago

OpenAI Japan announces the Japan Teen Safety Blueprint, introducing stronger age protections, parental controls, and well-being safeguards for teens using generative AI.

Use subagents and custom agents in Codex

productionSimon Willison's Weblog4d ago

<p><strong><a href="https://developers.openai.com/codex/subagents">Use subagents and custom agents in Codex</a></strong></p> Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag.</p> <p>They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" i

Quoting A member of Anthropic’s alignment-science team

productionSimon Willison's Weblog4d ago

<blockquote cite="https://www.newyorker.com/news/annals-of-inquiry/the-pentagon-went-to-war-with-anthropic-whats-really-at-stake?_sp=9a6e0ff7-2bfd-46f8-a9e1-3941ef2003b5.1773495048769"><p>The point of <a href="https://simonwillison.net/2025/Jun/20/agentic-misalignment/">the blackmail exercise</a> was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it

Coding agents for data analysis

productionSimon Willison's Weblog4d ago

<p><strong><a href="https://simonw.github.io/nicar-2026-coding-agents/">Coding agents for data analysis</a></strong></p> Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude Code and OpenAI Codex can be used to explore, analyze and clean data.</p> <p>Here's the table of contents:</p> <blockquote> <ul> <li><a href="https://simonw.github.io/nicar-2026-coding-agents/codin

How coding agents work

productionSimon Willison's Weblog5d ago

<p><em><a href="https://simonwillison.net/guides/agentic-engineering-patterns/">Agentic Engineering Patterns</a> ></em></p> <p>As with any tool, understanding how <a href="https://simonwillison.net/guides/agentic-engineering-patterns/what-is-agentic-engineering/">coding agents</a> work under the hood can help you make better decisions about how to apply them.</p> <p>A coding agent is a piece of software that acts as a <strong>harness</strong> for an LLM, extending that LLM with additional capabi

What is agentic engineering?

productionSimon Willison's Weblog5d ago

<p><em><a href="https://simonwillison.net/guides/agentic-engineering-patterns/">Agentic Engineering Patterns</a> ></em></p> <p>I use the term <strong>agentic engineering</strong> to describe the practice of developing software with the assistance of coding agents.</p> <p>What are <strong>coding agents</strong>? They're agents that can both write and execute code. Popular examples include <a href="https://code.claude.com/">Claude Code</a>, <a href="https://openai.com/codex/">OpenAI Codex</a>, and

My fireside chat about agentic engineering at the Pragmatic Summit

productionSimon Willison's Weblog6d ago

<p>I was a speaker last month at the <a href="https://www.pragmaticsummit.com/">Pragmatic Summit</a> in San Francisco, where I participated in a fireside chat session about <a href="https://simonwillison.net/guides/agentic-engineering-patterns/">Agentic Engineering</a> hosted by Eric Lui from Statsig.</p> <p>The video is <a href="https://www.youtube.com/watch?v=owmJyKVu5f8">available on YouTube</a>. Here are my highlights from the conversation.</p> <iframe style="margin-top: 1.5em; margin-bottom

Designing AI agents to resist prompt injection

productionOpenAI Blog3/11/2026

How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.

AI should help us produce better code

productionSimon Willison's Weblog3/10/2026

<p><em><a href="https://simonwillison.net/guides/agentic-engineering-patterns/">Agentic Engineering Patterns</a> ></em></p> <p>Many developers worry that outsourcing their code to AI tools will result in a drop in quality, producing bad code that's churned out fast enough that decision makers are willing to overlook its flaws.</p> <p>If adopting coding agents demonstrably reduces the quality of the code and features you are producing, you should address that problem directly: figure out which as

Production query plans without production data

productionSimon Willison's Weblog3/9/2026

<p><strong><a href="https://boringsql.com/posts/portable-stats/">Production query plans without production data</a></strong></p> Radim Marek describes the new <a href="https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-STATSMOD"><code>pg_restore_relation_stats()</code> and <code>pg_restore_attribute_stats()</code> functions</a> that were introduced <a href="https://www.postgresql.org/docs/current/release-18.html">in PostgreSQL 18</a> in September 2025.</p> <p>The Postgr

LeRobot v0.5.0: Scaling Every Dimension

productionHugging Face Blog3/9/2026

How Balyasny Asset Management built an AI research engine for investing

productionOpenAI Blog3/6/2026

See how Balyasny built an AI research system with GPT-5.4, rigorous model evaluation, and agent workflows to transform investment analysis at scale.

Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock

productionOpenAI Blog2/27/2026

Stateful Runtime for Agents in Amazon Bedrock brings persistent orchestration, memory, and secure execution to multi-step AI workflows powered by OpenAI.

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

productionGoogle AI Blog2/26/2026

<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/NB2_Hero.max-600x600.format-webp.webp">Our latest image generation model offers advanced world knowledge, production-ready specs, subject consistency and more, all at Flash speed.

Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting

productionOpenAI Blog2/26/2026

OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.

Why we no longer evaluate SWE-bench Verified

productionOpenAI Blog2/23/2026

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

OpenAI announces Frontier Alliance Partners

productionOpenAI Blog2/23/2026

OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.

Advancing independent research on AI alignment

productionOpenAI Blog2/19/2026

OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.

Introducing EVMbench

productionOpenAI Blog2/18/2026

OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.

Beyond rate limits: scaling access to Codex and Sora

productionOpenAI Blog2/13/2026

How OpenAI built a real-time access system combining rate limits, usage tracking, and credits to power continuous access to Sora and Codex.

Scaling social science research

productionOpenAI Blog2/13/2026

GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.

Harness engineering: leveraging Codex in an agent-first world

productionOpenAI Blog2/11/2026

By Ryan Lopopolo, Member of the Technical Staff

Bringing ChatGPT to GenAI.mil

productionOpenAI Blog2/9/2026

OpenAI for Government announces the deployment of a custom ChatGPT on GenAI.mil, bringing secure, safety-forward AI to U.S. defense teams.

Unlocking the Codex harness: how we built the App Server

productionOpenAI Blog2/4/2026

Learn how to embed the Codex agent using the Codex App Server, a bidirectional JSON-RPC API powering streaming progress, tool use, approvals, and diffs.

Inside OpenAI’s in-house data agent

productionOpenAI Blog1/29/2026

How OpenAI built an in-house AI data agent that uses GPT-5, Codex, and memory to reason over massive datasets and deliver reliable insights in minutes.

Keeping your data safe when an AI agent clicks a link

productionOpenAI Blog1/28/2026

Learn how OpenAI protects user data when AI agents open links, preventing URL-based data exfiltration and prompt injection with built-in safeguards.

Powering tax donations with AI powered personalized recommendations

productionOpenAI Blog1/27/2026

TRUSTBANK partnered with Recursive to build Choice AI using OpenAI models, delivering personalized, conversational recommendations that simplify Furusato Nozei gift discovery. A multi-agent system helps donors navigate thousands of options and find gifts that match their preferences.

Unrolling the Codex agent loop

productionOpenAI Blog1/23/2026

A technical deep dive into the Codex agent loop, explaining how Codex CLI orchestrates models, tools, prompts, and performance using the Responses API.

Scaling PostgreSQL to power 800 million ChatGPT users

productionOpenAI Blog1/22/2026

An inside look at how OpenAI scaled PostgreSQL to millions of queries per second using replicas, caching, rate limiting, and workload isolation.

Netomi’s lessons for scaling agentic systems into the enterprise

productionOpenAI Blog1/8/2026

How Netomi scales enterprise AI agents using GPT-4.1 and GPT-5.2—combining concurrency, governance, and multi-step reasoning for reliable production workflows.

Continuously hardening ChatGPT Atlas against prompt injection

productionOpenAI Blog12/22/2025

OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-and-patch loop helps identify novel exploits early and harden the browser agent’s defenses as AI becomes more agentic.

Evaluating chain-of-thought monitorability

productionOpenAI Blog12/18/2025

OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.

Updating our Model Spec with teen protections

productionOpenAI Blog12/18/2025

OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.

Addendum to GPT-5.2 System Card: GPT-5.2-Codex

productionOpenAI Blog12/18/2025

This system card outlines the comprehensive safety measures implemented for GPT‑5.2-Codex. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.

Evaluating AI’s ability to perform scientific research tasks

productionOpenAI Blog12/16/2025

OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.

OpenAI co-founds Agentic AI Foundation, donates AGENTS.md

productionOpenAI Blog12/9/2025

OpenAI co-founds the Agentic AI Foundation under the Linux Foundation and donates AGENTS.md to support open, interoperable standards for safe agentic AI.

Inside Mirakl's agentic commerce vision

productionOpenAI Blog12/1/2025

Mirakl is redefining commerce through AI agents and ChatGPT Enterprise—achieving faster documentation, smarter customer support, and building toward agent-native commerce with Mirakl Nexus.

Strengthening our safety ecosystem with external testing

productionOpenAI Blog11/19/2025

OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.

How evals drive the next chapter in AI for businesses

productionOpenAI Blog11/19/2025

Learn how evals help businesses define, measure, and improve AI performance—reducing risk, boosting productivity, and driving strategic advantage.

GPT-5.1-Codex-Max System Card

productionOpenAI Blog11/19/2025

This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.

How Scania is accelerating work with AI across its global workforce

productionOpenAI Blog11/19/2025

Description: Global manufacturer Scania is scaling AI with ChatGPT Enterprise. With team-based onboarding and strong guardrails, AI is boosting productivity, quality, and innovation.

How Philips is scaling AI literacy across 70,000 employees

productionOpenAI Blog11/13/2025

Philips is scaling AI literacy with ChatGPT Enterprise, training 70,000 employees to use AI responsibly and improve healthcare outcomes worldwide.

GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum

productionOpenAI Blog11/12/2025

This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.

Notion’s rebuild for agentic AI: How GPT‑5 helped unlock autonomous workflows

productionOpenAI Blog11/7/2025

Discover how Notion rebuilt its AI architecture with GPT-5 to create autonomous agents that reason, act, and adapt across workflows. Learn how this shift unlocked smarter, faster, and more flexible productivity in Notion 3.0.

From Pilot to Practice: How BBVA Is Scaling AI Across the Organization

productionOpenAI Blog11/6/2025

BBVA is reimagining how employees work with ChatGPT Enterprise, embedding AI into everyday operations. The bank has saved hours per week per employee, created 20,000+ Custom GPTs, and achieved up to 80% efficiency gains.

Introducing the Teen Safety Blueprint

productionOpenAI Blog11/6/2025

Discover OpenAI’s Teen Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower young people online.

Introducing Aardvark: OpenAI’s agentic security researcher

productionOpenAI Blog10/30/2025

OpenAI introduces Aardvark, an AI-powered security researcher that autonomously finds, validates, and helps fix software vulnerabilities at scale. The system is in private beta—sign up to join early testing.

How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas

productionOpenAI Blog10/30/2025

A deep dive into OWL, the new architecture powering ChatGPT Atlas—decoupling Chromium, enabling fast startup, rich UI, and agentic browsing with ChatGPT.

gpt-oss-safeguard technical report

productionOpenAI Blog10/29/2025

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using the underlying gpt-oss models as a baseline. For more information about the development and architecture of the underlying gpt-oss models, see

Defining and evaluating political bias in LLMs

productionOpenAI Blog10/9/2025

Learn how OpenAI evaluates political bias in ChatGPT through new real-world testing methods that improve objectivity and reduce bias.

Introducing AgentKit, new Evals, and RFT for agents

productionOpenAI Blog10/6/2025

Today, we’re releasing new tools to help developers go from prototype to production faster: AgentKit, expanded evals capabilities, and reinforcement fine-tuning for agents.

Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol

productionOpenAI Blog9/29/2025

We’re taking first steps toward agentic commerce in ChatGPT with new ways for people, AI agents, and businesses to shop together.

Measuring the performance of our models on real-world tasks

productionOpenAI Blog9/25/2025

OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.

Democratizing AI Safety with RiskRubric.ai

productionHugging Face Blog9/18/2025

Detecting and reducing scheming in AI models

productionOpenAI Blog9/17/2025

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.

Teen safety, freedom, and privacy

productionOpenAI Blog9/16/2025

Explore OpenAI’s approach to balancing teen safety, freedom, and privacy in AI use.

Addendum to GPT-5 system card: GPT-5-Codex

productionOpenAI Blog9/15/2025

This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.

Jupyter Agents: training LLMs to reason with notebooks

productionHugging Face Blog9/10/2025

Shipping smarter agents with every new model

productionOpenAI Blog9/9/2025

Discover how SafetyKit leverages OpenAI GPT-5 to enhance content moderation, enforce compliance, and outpace legacy safety systems with greater accuracy .

Why language models hallucinate

productionOpenAI Blog9/5/2025

OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.

Collective alignment: public input on our Model Spec

productionOpenAI Blog8/27/2025

OpenAI surveyed over 1,000 people worldwide on how AI should behave and compared their views to our Model Spec. Learn how collective alignment is shaping AI defaults to better reflect diverse human values and perspectives.

OpenAI and Anthropic share findings from a joint safety evaluation

productionOpenAI Blog8/27/2025

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.

Scaling domain expertise in complex, regulated domains

productionOpenAI Blog8/21/2025

Discover how Blue J is transforming tax research with AI-powered tools built on GPT-4.1. By combining domain expertise with Retrieval-Augmented Generation, Blue J delivers fast, accurate, and fully-cited tax answers—trusted by professionals across the US, Canada, and the UK.

Scaling accounting capacity with OpenAI

productionOpenAI Blog8/12/2025

Built with OpenAI o3, o3-Pro, GPT-4.1, and GPT-5, Basis’ AI agents help accounting firms save up to 30% of their time and expand capacity for advisory and growth.

From hard refusals to safe-completions: toward output-centric safety training

productionOpenAI Blog8/7/2025

Discover how OpenAI's new safe-completions approach in GPT-5 improves both safety and helpfulness in AI responses—moving beyond hard refusals to nuanced, output-centric safety training for handling dual-use prompts.

Vision Language Model Alignment in TRL ⚡️

productionHugging Face Blog8/7/2025

Introducing gpt-oss

productionOpenAI Blog8/5/2025

We’re releasing gpt-oss-120b and gpt-oss-20b—two state-of-the-art open-weight language models that deliver strong real-world performance at low cost. Available under the flexible Apache 2.0 license, these models outperform similarly sized open models on reasoning tasks, demonstrate strong tool use capabilities, and are optimized for efficient deployment on consumer hardware.

Three lessons for creating a sustainable AI advantage

productionOpenAI Blog7/30/2025

Discover how Intercom built a scalable AI platform with 3 key lessons—from evaluations to architecture—to lead the future of customer support.

Model ML is helping financial firms rebuild with AI from the ground up

productionOpenAI Blog7/23/2025

As part of our Executive Function series, Model ML CEO Chaz Englander discusses how AI-native infrastructure and autonomous agents are transforming financial services workflows.

Introducing ChatGPT agent

productionOpenAI Blog7/17/2025

Introducing ChatGPT agent: it thinks and acts, using tools to complete tasks like research, bookings, and slideshows—all with your guidance.

ChatGPT agent System Card

productionOpenAI Blog7/17/2025

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

Agent bio bug bounty call

productionOpenAI Blog7/17/2025

OpenAI invites researchers to its Bio Bug Bounty. Test the ChatGPT agent’s safety with a universal jailbreak prompt and win up to $25,000.

ScreenEnv: Deploy your full stack Desktop Agent

productionHugging Face Blog7/10/2025

Toward understanding and preventing misalignment generalization

productionOpenAI Blog6/18/2025

We study how training on incorrect responses can cause broader misalignment in language models and identify an internal feature driving this behavior—one that can be reversed with minimal fine-tuning.