Discussion about this post

User's avatar
The AI Architect's avatar

Great curation this week. The pieces on measuring thinking efficiency in reasoning models and the rise of subagents are particuarly timely, feels like we're seeing a shift from monolithic models to more modular, task-specific agents. The LLM-as-a-Judge validation work from CMU is important too, lots of people are relying on tht approach without understanding rating indeterminacy.

No posts

Ready for more?