The Open/Closed Problem in AI

At MLSys I watched the field celebrate efficiency: better kernels, inference ASICs, training ASICs. That work is not neutral. It is hardware hardening around open-loop learning, and every layer of specialization makes closed-loop learning harder to even attempt.

Maxim Khailo

23 May 2026 • 4 min read

I went to the ninth MLSys conference in Seattle. This is a conference of people in research and industry building ML systems. The vast majority of work that I saw is building systems that train and use LLMs. The biggest focus was on efficiency. How do you train LLMs more efficiently? How do you deploy and use them more efficiently? When I was trying to understand the themes and messages I witnessed, the Open/Closed problem occurred to me.

To understand what the Open/Closed problem is, we first need to understand a little bit of history.

When 3D computer graphics were exploding in the 90s, they were first being rendered by a CPU. A CPU is a generic computing device where you can do everything. So naturally 3D graphics varied wildly, including some games using voxels instead of polygons. There was a great amount of creativity. Eventually we started to get 3D acceleration via graphics cards. These cards had fixed pipelines. So while they greatly accelerated polygon rendering and certain effects, they limited the creativity in how you do graphics and you lost variety. Eventually GPU makers like Nvidia invented pixel and vertex shaders. This added flexibility back into graphics, allowing more creative games. On top of this programmability, CUDA was born. CUDA was so flexible that the AI community figured out how to train neural nets on GPUs, which allowed them to try bigger models. AlexNet was the inflection point and why we are even talking about AI and LLMs today. What I noticed at MLSys is that the companies building GPUs are now constraining them to be more efficient for inference vs training. Now you have ASICs designed to do just inference, while others are optimized for training.

In other words, we started with an open system (CPU), went to a closed system (fixed-pipeline GPU), back to an open system (programmable GPU), and back to a more closed system (specialized ASICs). This is part of what I mean by the Open/Closed problem. But this also coincides with another Open/Closed problem in a different sense.

The AIs we are deploying are trained using an open loop. What I mean is that the models themselves don't learn. You need an outside system, outside the model's circuit, to train them. You gather data, come up with a loss function, and do SGD to train them via backpropagation. Then you deploy them. After the model is deployed it does not learn. Its memory, stored in its parameters, doesn't change. People are hacking around this fact using external memory via Agents. Agents use an LLM (which doesn't learn) to update an external memory source (like markdown files, a database, etc.) using tool calls. So Agents learn, but they learn in a very inefficient way.

Our brains use a closed loop to learn. Our brains have a model of the outside world; they make predictions on what our senses should sense, and then check our senses to see how far off the prediction is. If the prediction is wrong, the brain is surprised and updates the model to make a more accurate prediction. In other words, there is no outside process for our brains to accumulate knowledge. It's done all inside our brain, a closed loop.

This is the other Open/Closed problem I noticed at MLSys. It seemed everyone is working to make open-loop learning better and more efficient, either by changing model architectures or the way you train them, optimizing GPU kernels, etc. I didn't see anyone working on closed-loop learning, where the model itself, without outside intervention, updates itself when it accumulates knowledge. These two Open/Closed problems are the same problem.

So here is the uncomfortable claim. The efficiency work the field is celebrating (better kernels, inference ASICs, training ASICs) is not just neutral progress. It is hardware hardening around open-loop learning, and every layer of specialization makes closed-loop learning harder to even attempt. We are optimizing our way into a paradigm and calling it advancement. Fixed GPU pipelines didn't just speed up graphics; they quietly killed the wild experiments for a decade until programmability came back. The same thing is happening now, and almost no one at MLSys seemed to notice.

And the mechanism isn't vague. An inference ASIC physically bakes in the open-loop assumption. The weights are frozen, so parameter memory is built to be read, not rewritten. Compute and memory sit in separate places because that is efficient when the model never changes. Everything is shaped around big batched matmuls because that is what serving a static model looks like. None of this is an oversight. It is the chip doing exactly its job. But a model that learns in a closed loop needs the opposite of all of it: weights that change constantly, updates at fine grain, memory and compute fused so a parameter can rewrite itself in place. A chip optimized for inference doesn't just fail to help with that. It assumes it away in silicon. Every generation of specialization pours more concrete over the road not taken.

Eric Kandel won a Nobel Prize for showing that memory isn't stored by some separate system. A single neuron both computes and physically rewrites itself as it learns. The breakthrough we need is a model that updates itself, with no outside process, no separate training run, memory and compute fused at fine grain the way they are in a neuron. That requires a substrate to experiment on: something like an FPGA, but bigger, faster, and built for this. Nobody is building it, because everybody is busy optimizing the thing we already have.

So I'll put it plainly. If you are working on open-loop efficiency, you are not working on the breakthrough. You are working on the thing that will make the breakthrough harder to find. The hardware is hardening around the wrong paradigm while the field congratulates itself on speed. The window to experiment with closed-loop learning is open right now, and it is closing with every ASIC that ships. Someone should build the substrate before it does.

Sign up for more like this.