ai benchmark - Search News

14hon MSN

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

Google Expands Gemini 2.0: Pro Experimental and Flash Lite Models Now Available

Google introduces Gemini 2.0 Flash, Pro Experimental, and Flash Lite models with improved speed, reasoning, and multimodal ...

18hon MSN

Google Just Launched Gemini 2.0 Flash and Pro for Users and Developers

Google has upgraded its Gemini offerings across the board with Gemini 2.0 Flash and Gemini 2.0 Pro. Here's what's new and ...

1hon MSN

TRAIT Explained – How AI chatbots are evolving with distinct personalities?

A study titled Do LLMs Have Distinct and Consistent Personality?, detailed in a paper from Yonsei University and Seoul National University, introduces TRAIT.

21h

AI War Heats Up: Google's Gemini 2.0 Flash Goes Public After DeepSeek Disruption – Key Details Inside

Google has made Gemini 2.0 "generally available" through the Gemini API in Google AI Studio and Vertex AI, marking a ...

20h

Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds

The idea of ranking AI models has been thrown into dispute after new research shows it’s simple to fix the results—and boost ...

3hon MSN

Stock market today: Asian shares mixed as DeepSeek lifts Chinese tech stocks

Asian shares Friday were mixed, with Chinese technology stocks rising as most other Asian equities declined. Japan’s ...

Top Canadian Stocks to Buy With $5,000 in 2025

These top Canadian stocks are poised to deliver impressive gains led by significant demand and sector-specific tailwinds.

22hon MSN

ChatGPT's powerful 'Deep Research' upgrade got an open source replica — in just 24 hours

Deep Research is an AI agent which can conduct complex multi-step web research using reasoning and a base LLM, in this case ...

Livewire Markets6h

The implications of extreme concentration

Concentration in equity markets has reached unprecedented levels, particularly in the United States.(1) A select few mega-cap ...

23hon MSN

I put OpenAI's new o3-mini model to the test — and the results are staggering

OpenAI has just released o3-mini, a new reasoning model which offers the same kind of performance as its earlier o1 model, ...

21h

Dimensity 9400 Makes BIG Splash In AnTuTu Benchmark Rankings…

With a record-breaking score of 3,449,366 points, the Dimensity 9400 leads in CPU, GPU, memory, and UX performance. MediaTek has officially taken the performance crown, with its Dimensity 9400 leading ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results