MontréalCrépusculeCiel clair

David Paquet Pitts/Radar

On My Radar – Week of Jun 8, 2026

A curated collection of interesting finds for this week.

Tools

4 finds

karpathy/autoresearch
Karpathy's experiment in automating research loops with LLMs. Worth a look for harness-design ideas.
Watch Later
lm-evaluation-harness
EleutherAI's de-facto standard harness for benchmarking LLMs across hundreds of tasks. The reference tool when you need reproducible eval numbers.
To Test
GEPA (reference implementation)
The official MIT-licensed Python impl of the GEPA optimizer — evolve prompts, code, and arbitrary text artifacts against your own eval.
To Test
DeepEval
Popular open-source LLM eval framework — pytest-style assertions, RAG/agent metrics, and built-in GEPA-style prompt optimization.
To Test

Libs

3 finds

Research

1 find

GEPA: Reflective Prompt Evolution Can Outperform RL
ICLR 2026 Oral (Stanford/Berkeley/UT Austin). Treats execution traces as readable signal, evolving prompts via NL reflection + Pareto selection — 35× fewer rollouts than RL.
Wow