Scale Labs
[PAPERS][BLOG][LEADERBOARDS][SHOWDOWN]

Scale Labs Newsletter

Research, benchmarks, and insights — delivered to your inbox.

Copyright 2026 Scale Inc. All rights reserved.

TermsPrivacy

[SHOWDOWN]

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

Showdown ranks AI models based on how they perform in real-world use -- not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.

Methodology & Technical Report
Prompts compared0

Real conversation prompts compared across models through pairwise votes.

Active users0

From 80+ countries and 70+ languages, spanning all backgrounds and professions.

1

gemini-3-flash

gemini-3-flash
1,911
1050.59
-12.68 +13.20
1

gemini-3-pro-preview

gemini-3-pro-preview
2,020
1044.49
-15.11 +16.26
2

gpt-4o-audio-preview-2025-06-03

gpt-4o-audio-preview-2025-06-03
2,310
1020.81
-14.44 +10.54
3

qwen3-omni

qwen3-omni
558
999.01
-18.24 +24.82
5

gemma3n

gemma3n
651
951.25
-16.19 +14.63
5

gpt-realtime

gpt-realtime
2,438
933.83
-11.50 +11.69

Performance Comparison Across Language Models

Win Rate vs. Each Model iconWin Rate vs. Each Model
Battle Count vs. Each Model iconBattle Count vs. Each Model
Confidence Intervals iconConfidence Intervals
Average Win Rate iconAverage Win Rate
Prompt Distribution iconPrompt Distribution

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution

Voice Model Performance Comparison

Win Rate vs. Each Model iconWin Rate vs. Each Model
Battle Count vs. Each Model iconBattle Count vs. Each Model
Confidence Intervals iconConfidence Intervals
Average Win Rate iconAverage Win Rate
Prompt Distribution iconPrompt Distribution

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution

Leaderboard - LLMs

Style Control
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
8,608
1145.33
-4.53 +5.40
1

gemini-3-flash

gemini-3-flash
8,579
1138.66
-4.20 +4.85
3

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
7,837
1128.64
-5.91 +4.88
3

gemini-2.5-pro

gemini-2.5-pro
14,311
1128.46
-4.36 +3.86
3

claude-opus-4-5-20251101

claude-opus-4-5-20251101
9,710
1126.76
-4.36 +3.73
3

gemini-3-pro-preview

gemini-3-pro-preview
10,799
1124.43
-4.22 +3.84
6

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
14,643
1118.01
-3.39 +3.61
8

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
14,688
1109.30
-3.37 +4.00
8

gpt-5-chat

gpt-5-chat
11,529
1107.36
-4.73 +3.98
8

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
12,602
1106.00
-3.91 +4.71
9

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
10,059
1098.36
-4.17 +4.85
11

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
8,526
1091.58
-3.93 +5.04
12

kimi-k2-thinking

kimi-k2-thinking
11,716
1088.30
-4.38 +3.02
12

claude-opus-4-1-20250805

claude-opus-4-1-20250805
15,656
1085.82
-3.73 +3.96
13

deepseek-v3p2

deepseek-v3p2
9,961
1081.07
-5.35 +4.08
14

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
14,001
1078.14
-3.82 +4.78
16

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
7,871
1071.29
-5.33 +3.96
17

gemini-2.5-flash

gemini-2.5-flash
13,692
1070.72
-3.58 +2.88
17

claude-sonnet-4-20250514

claude-sonnet-4-20250514
23,664
1069.94
-2.48 +2.43
17

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
7,704
1064.40
-4.31 +4.74
19

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
13,836
1062.49
-3.61 +3.75
22

deepseek-r1-0528

deepseek-r1-0528
11,430
1048.33
-3.27 +4.83
22

o3-2025-04-16-medium

o3-2025-04-16-medium*
23,181
1045.01
-2.64 +3.16
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
16,606
1012.72
-2.88 +4.15
25

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
12,587
1000.00
-3.20 +4.05
25

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
23,980
999.57
-3.60 +3.06
Style Control
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
2,486
1131.90
-8.96 +7.92
1

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
2,284
1129.19
-9.40 +9.29
1

claude-opus-4-5-20251101

claude-opus-4-5-20251101
2,386
1119.60
-10.98 +7.99
1

gemini-3-flash

gemini-3-flash
2,001
1118.60
-8.96 +10.82
3

gemini-2.5-pro

gemini-2.5-pro
3,632
1109.01
-7.12 +7.45
3

gemini-3-pro-preview

gemini-3-pro-preview
2,879
1107.58
-5.63 +11.22
3

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
4,333
1104.04
-6.88 +6.55
3

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
4,079
1102.72
-6.36 +7.94
5

gpt-5-chat

gpt-5-chat
4,562
1098.67
-6.57 +5.16
7

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
2,971
1090.39
-7.63 +9.19
9

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
2,469
1084.55
-9.45 +9.78
9

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
2,419
1083.33
-8.37 +9.24
10

kimi-k2-thinking

kimi-k2-thinking
2,561
1082.17
-8.08 +9.26
10

claude-opus-4-1-20250805

claude-opus-4-1-20250805
4,901
1075.52
-5.68 +7.56
11

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
2,162
1072.61
-9.73 +7.77
11

deepseek-v3p2

deepseek-v3p2
2,236
1067.94
-9.48 +9.33
14

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
4,337
1063.46
-7.54 +7.50
14

gemini-2.5-flash

gemini-2.5-flash
3,309
1062.78
-8.56 +7.52
15

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
4,745
1062.85
-6.93 +4.93
15

claude-sonnet-4-20250514

claude-sonnet-4-20250514
6,917
1058.99
-4.88 +5.75
15

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
2,058
1058.39
-10.81 +9.31
22

deepseek-r1-0528

deepseek-r1-0528
2,740
1033.20
-8.41 +8.77
22

o3-2025-04-16-medium

o3-2025-04-16-medium*
7,144
1025.17
-5.77 +6.06
24

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
3,992
1000.00
-6.44 +9.47
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
4,992
994.01
-6.22 +5.47
25

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
7,151
986.45
-5.37 +4.52
Style Control
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
3,387
1147.28
-7.14 +8.08
1

gemini-3-flash

gemini-3-flash
3,562
1134.98
-6.76 +6.77
2

gemini-2.5-pro

gemini-2.5-pro
6,023
1127.63
-6.24 +5.85
2

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
3,115
1122.11
-9.52 +7.00
3

gemini-3-pro-preview

gemini-3-pro-preview
4,061
1118.84
-9.07 +5.70
3

claude-opus-4-5-20251101

claude-opus-4-5-20251101
4,267
1115.89
-6.46 +8.25
4

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
5,602
1111.28
-7.52 +5.49
4

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
5,602
1108.17
-5.02 +6.51
4

gpt-5-chat

gpt-5-chat
3,333
1105.97
-9.47 +7.35
7

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
5,655
1099.71
-4.99 +6.18
7

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
4,461
1099.59
-6.90 +6.05
9

kimi-k2-thinking

kimi-k2-thinking
5,626
1090.28
-4.59 +7.78
10

deepseek-v3p2

deepseek-v3p2
4,860
1089.12
-6.01 +5.72
12

claude-opus-4-1-20250805

claude-opus-4-1-20250805
5,279
1079.84
-5.75 +7.09
12

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
3,290
1078.14
-7.12 +8.67
14

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
4,608
1075.32
-5.63 +5.82
14

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
2,881
1068.97
-9.87 +8.81
15

claude-sonnet-4-20250514

claude-sonnet-4-20250514
8,716
1066.31
-4.53 +5.85
16

gemini-2.5-flash

gemini-2.5-flash
5,898
1065.23
-5.91 +5.61
17

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
2,825
1060.02
-8.25 +7.26
18

deepseek-r1-0528

deepseek-r1-0528
5,114
1052.69
-5.92 +7.24
20

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
4,165
1051.07
-7.11 +6.17
22

o3-2025-04-16-medium

o3-2025-04-16-medium*
8,641
1042.61
-3.47 +3.53
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
6,146
1011.40
-5.18 +7.02
24

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
3,871
1000.00
-7.59 +8.30
25

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
8,866
997.36
-4.97 +3.63
Style Control
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
1,394
1145.16
-10.56 +11.95
1

gemini-3-flash

gemini-3-flash
1,237
1140.10
-11.92 +15.06
1

gemini-3-pro-preview

gemini-3-pro-preview
1,629
1138.11
-11.60 +9.54
1

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
1,442
1136.35
-9.00 +11.12
1

claude-opus-4-5-20251101

claude-opus-4-5-20251101
1,501
1130.62
-9.29 +11.99
1

gemini-2.5-pro

gemini-2.5-pro
2,147
1126.52
-8.51 +13.01
1

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
2,536
1126.43
-13.67 +9.38
2

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
2,476
1125.35
-10.04 +7.49
5

gpt-5-chat

gpt-5-chat
2,765
1114.01
-9.40 +9.03
5

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
1,338
1112.90
-16.95 +10.13
5

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
1,988
1108.71
-11.97 +12.96
9

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
1,568
1099.56
-13.03 +8.96
9

claude-opus-4-1-20250805

claude-opus-4-1-20250805
3,328
1098.05
-8.25 +9.50
9

kimi-k2-thinking

kimi-k2-thinking
1,649
1092.02
-10.08 +15.08
10

deepseek-v3p2

deepseek-v3p2
1,379
1090.85
-12.30 +11.08
10

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
1,212
1084.86
-13.38 +13.60
12

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
2,918
1085.35
-8.13 +8.06
12

gemini-2.5-flash

gemini-2.5-flash
2,110
1085.14
-9.31 +8.56
14

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
3,023
1076.26
-10.35 +7.14
14

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
1,208
1073.10
-11.86 +12.56
15

claude-sonnet-4-20250514

claude-sonnet-4-20250514
4,474
1074.23
-6.55 +7.11
21

deepseek-r1-0528

deepseek-r1-0528
1,831
1051.91
-12.30 +10.39
22

o3-2025-04-16-medium

o3-2025-04-16-medium*
4,531
1041.76
-6.58 +6.93
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
3,143
1008.37
-9.71 +7.89
24

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
2,682
1000.00
-7.59 +8.86
24

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
4,585
997.19
-6.67 +6.41
Style Control
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
5,147
1150.82
-6.05 +6.57
1

gemini-3-flash

gemini-3-flash
5,008
1138.64
-5.72 +6.90
2

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
4,734
1135.71
-8.22 +5.12
2

gemini-2.5-pro

gemini-2.5-pro
9,328
1132.13
-4.57 +3.68
3

claude-opus-4-5-20251101

claude-opus-4-5-20251101
6,396
1126.71
-7.11 +5.39
3

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
8,743
1125.46
-5.66 +4.01
5

gemini-3-pro-preview

gemini-3-pro-preview
6,067
1119.60
-5.12 +5.44
7

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
8,919
1111.53
-4.43 +4.86
7

gpt-5-chat

gpt-5-chat
5,859
1110.29
-5.58 +6.51
8

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
8,530
1109.18
-4.43 +4.93
9

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
6,350
1101.17
-5.49 +5.28
11

kimi-k2-thinking

kimi-k2-thinking
7,858
1094.68
-4.62 +4.79
11

deepseek-v3p2

deepseek-v3p2
6,702
1092.83
-4.41 +4.86
11

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
5,250
1090.58
-5.66 +5.33
12

claude-opus-4-1-20250805

claude-opus-4-1-20250805
8,415
1090.09
-4.74 +4.90
14

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
7,626
1081.60
-4.67 +4.85
16

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
4,030
1078.28
-5.58 +6.53
16

claude-sonnet-4-20250514

claude-sonnet-4-20250514
13,934
1074.66
-3.27 +3.30
16

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
4,058
1069.38
-5.98 +9.94
17

gemini-2.5-flash

gemini-2.5-flash
9,208
1071.80
-4.38 +3.44
18

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
7,445
1065.82
-5.82 +5.96
20

deepseek-r1-0528

deepseek-r1-0528
7,527
1060.38
-5.33 +4.59
23

o3-2025-04-16-medium

o3-2025-04-16-medium*
13,678
1045.91
-2.81 +3.91
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
9,897
1014.89
-5.40 +4.64
25

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
13,886
1004.90
-3.86 +3.61
25

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
6,711
1000.00
-6.08 +5.06
Style Control
1

gemini-3-flash

gemini-3-flash
1,907
1172.47
-10.74 +10.32
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
1,462
1161.84
-12.86 +13.65
2

gemini-2.5-pro

gemini-2.5-pro
2,980
1151.79
-7.37 +9.38
4

gemini-3-pro-preview

gemini-3-pro-preview
1,756
1126.22
-11.27 +12.30
4

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
2,686
1122.42
-8.89 +7.70
4

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
2,105
1115.59
-9.28 +9.40
4

gpt-5-chat

gpt-5-chat
1,639
1106.21
-10.70 +11.36
4

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
2,422
1104.02
-9.05 +11.27
5

claude-opus-4-5-20251101

claude-opus-4-5-20251101
1,603
1104.84
-14.59 +9.47
6

kimi-k2-thinking

kimi-k2-thinking
2,740
1103.01
-7.14 +8.27
6

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
1,339
1097.46
-13.95 +12.99
7

deepseek-v3p2

deepseek-v3p2
2,235
1094.97
-7.92 +9.17
7

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
1,226
1089.15
-11.36 +14.23
10

claude-opus-4-1-20250805

claude-opus-4-1-20250805
2,619
1084.06
-7.63 +9.35
10

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
2,236
1083.96
-9.40 +9.73
10

deepseek-r1-0528

deepseek-r1-0528
2,538
1082.59
-8.63 +8.43
12

o3-2025-04-16-medium

o3-2025-04-16-medium*
4,172
1077.70
-7.94 +6.20
12

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
2,322
1075.93
-10.15 +9.80
12

gemini-2.5-flash

gemini-2.5-flash
2,931
1074.30
-9.54 +9.36
18

claude-sonnet-4-20250514

claude-sonnet-4-20250514
4,378
1059.10
-5.51 +8.55
18

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
1,443
1054.41
-11.27 +12.47
20

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
2,078
1047.96
-10.23 +10.28
21

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
1,377
1037.71
-13.45 +10.30
24

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
4,523
1017.72
-7.12 +6.29
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
2,870
1004.45
-7.42 +7.96
24

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
2,127
1000.00
-9.97 +11.07
Style Control
1

claude-opus-4-5-20251101 (Thinking)

claude-opus-4-5-20251101 (Thinking)
374
1161.80
-23.97 +22.46
1

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929
805
1149.50
-19.60 +18.87
1

gpt-5.2-chat-latest

gpt-5.2-chat-latest
304
1147.57
-25.80 +25.17
1

claude-opus-4-5-20251101

claude-opus-4-5-20251101
420
1146.34
-20.15 +26.11
1

claude-sonnet-4-5-20250929 (Thinking)

claude-sonnet-4-5-20250929 (Thinking)
872
1145.89
-17.30 +15.86
1

gemini-2.5-pro

gemini-2.5-pro
558
1138.94
-14.37 +16.40
1

qwen3-235b-a22b-2507-v1

qwen3-235b-a22b-2507-v1
530
1137.90
-20.83 +20.03
1

gemini-3-pro-preview

gemini-3-pro-preview
506
1137.06
-17.98 +20.95
1

gpt-5.2-2025-12-11-medium

gpt-5.2-2025-12-11-medium
365
1132.96
-20.38 +26.04
1

kimi-k2-thinking

kimi-k2-thinking
379
1127.61
-22.04 +21.07
1

gpt-5.1-2025-11-13-medium

gpt-5.1-2025-11-13-medium
316
1126.00
-28.47 +21.98
1

claude-haiku-4-5-20251001 (Thinking)

claude-haiku-4-5-20251001 (Thinking)
377
1123.62
-23.19 +28.26
1

gemini-3-flash

gemini-3-flash
246
1120.02
-27.17 +31.37
2

gpt-5-chat

gpt-5-chat
720
1112.59
-15.91 +19.56
2

claude-haiku-4-5-20251001

claude-haiku-4-5-20251001
333
1099.09
-27.88 +32.28
4

deepseek-v3p2

deepseek-v3p2
304
1098.69
-21.26 +29.22
6

gemini-2.5-flash

gemini-2.5-flash
553
1101.10
-22.39 +22.44
8

claude-sonnet-4-20250514

claude-sonnet-4-20250514
1,306
1104.65
-15.30 +14.18
8

claude-opus-4-1-20250805

claude-opus-4-1-20250805
807
1098.43
-14.35 +20.27
10

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
1,060
1096.03
-13.78 +15.39
10

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
773
1093.29
-15.34 +16.24
21

deepseek-r1-0528

deepseek-r1-0528
417
1048.92
-22.25 +23.75
22

o3-2025-04-16-medium

o3-2025-04-16-medium*
1,082
1047.25
-17.06 +17.55
24

gpt-5-2025-08-07-medium

gpt-5-2025-08-07-medium*
883
1004.52
-17.39 +14.37
24

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
720
1000.00
-15.17 +13.58
24

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
1,166
993.17
-17.48 +10.28
* This model's API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.