Math Performance Tasks

12h

AI models that simulate internal debate dramatically improve accuracy on complex tasks

A new study reveals that top models like DeepSeek-R1 succeed by simulating internal debates. Here is how enterprises can harness this "society of thought" to build more robust, self-correcting agents.

Opinion

Digital Information WorldOpinion

AI is failing ‘Humanity’s Last Exam’. So what does that mean for machine intelligence?

Using human ability tests to benchmark AI is common practice, but it’s fundamentally misleading. Assuming a high test score means the machine has become more human-like is a category error, much like ...

Computer Weekly

South Korea debuts foundation model in sovereign AI push

A consortium led by SK Telecom has built a sovereign AI model designed to reduce reliance on foreign tech, lower costs for local industry, and propel South Korea into the top ranks of AI powers ...

Animals that understand numbers: The surprising math skills of bees, horses, dolphins, and more

Most people think that only people can understand numbers, but that's not true. Many animals can naturally figure out how ...

EurekAlert!

AI learns better when it talks to itself

Talking to oneself is a trait which feels inherently human - but it’s not just humans who can reap the benefits of such ...

The 74 on MSN

High-Poverty D.C. Charter School Students Outscore Wealthy Neighbors in Math

Charter school students in Washington, D.C.’s high-poverty Ward 8 far outshined their peers citywide in mathematics last year ...

TMCnet

MetaMetrics and North Dakota Department of Public Instruction Partner to Expand Lexile and Quantile Measures for Career Readiness

Key components now available through RUReady.ND.gov include MetaMetrics' career readiness bundle of tools that extend the use ...

Show inaccessible results

AI models that simulate internal debate dramatically improve accuracy on complex tasks

AI is failing ‘Humanity’s Last Exam’. So what does that mean for machine intelligence?

South Korea debuts foundation model in sovereign AI push

Animals that understand numbers: The surprising math skills of bees, horses, dolphins, and more

AI learns better when it talks to itself

High-Poverty D.C. Charter School Students Outscore Wealthy Neighbors in Math

MetaMetrics and North Dakota Department of Public Instruction Partner to Expand Lexile and Quantile Measures for Career Readiness

Qwen3-Max Thinking beats Gemini 3 Pro and GPT-5.2 on Humanity's Last Exam (with search)

How the Cerebellum Helps Words Flow From Your Brain

I tested every iPad model that Apple currently sells - here's my advice for Pro users in 2026

‘Iron Lung' Review: Markiplier's Slow Burn Video Game Adaptation Offers a Fascinating, Flawed Experiment

CPython vs. PyPy: Which Python runtime has the better JIT?