Posts
Pieces I've written on Substack.
-
If Nothing Off The Shelf Works, Consider Just Building It.
Some actionable steps for operators and org-builders, as we need scale-up org infrastructure in 2026.
-
How Come Math Olympiads Maxxing LLMs Aren't Holding Down An Easy Job?
As long-term coherency issues still limit their deployment independent of human oversight, non-stationary benchmarks could be a new frontier for evals.