Добавить новость

This researcher has a new way to measure AI performance. It's BS, literally.

Peter Gostev, AI capability lead at Arena

Peter Gostev

  • Peter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection.
  • Google Gemini 3.0 struggles with BullshitBench, failing to reject nonsense over half the time.
  • One AI company did way better than everyone else.

A new AI benchmark asks a deceptively simple question: Can machines tell when something is, well, BS?

Peter Gostev, AI capability lead at model-evaluation firm Arena...

Губернаторы России



Заголовки
Заголовки
Moscow.media
Ria.city

Новости России




Rss.plus

Музыкальные новости


Новости тенниса







Новости спорта