Douwe Osinga - 2 posts | goose | Your open source AI agent

Moving to issues as the new PRs

July 30, 2026 · 4 min read

Software Engineer

The goose GitHub repository showing 184 open pull requests, with the list fading into a blur

In the olden days, contributing your first PR to an open source project was rather involved. Even with great instructions, getting the project to build, the app to run and the tests to pass took real work. And even if you knew exactly which bug to fix, you needed a rough understanding of the project’s architecture and more detailed knowledge of the code you were changing.

Coding agents have changed all this. The problem is not code quality. Complaints about AI slop are all over the internet, but code written by agents is often better than what you would get from a first-time contributor. The problem is that agents have changed the economics of open source.

Self-Improving Agents Still Need Humans

June 17, 2026 · 5 min read

Douwe Osinga

Software Engineer

A human engineer reviews an AI agent feedback loop across benchmark dashboards and terminal logs

Goodhart's law is the benchmarker's curse: when a measure becomes a target, it stops being a good measure. Coding-agent benchmarks are almost designed to trigger it. The tasks are public, the result is one number, and the leaderboard inevitably fills up with harnesses that are, often without meaning to be, overfit to the benchmark.

That does not make the go-to standard Terminal-bench useless, but it does change how the goose team uses it. The leaderboard is a noisy measure of general agent ability. The signal is a pattern of failures: places where goose keeps getting stuck or where goose fails and another harness succeeds.

That is also why we usually benchmark with Sonnet rather than the strongest model available. We are not trying to get the largest possible number. We want enough failures left on the table to see what support the agent is missing.