English
全部
搜索
图片
视频
地图
资讯
更多
购物
航班
旅游
笔记本
Top stories
冬季运动会
Sports
U.S.
Local
World
Science
Technology
Entertainment
Business
More
Politics
过去 1 小时
时间不限
过去 24 小时
过去 7 天
过去 30 天
最佳匹配
最新
头部财经
50 分钟
北航开源Code2Bench:双扩展动态评测,代码大模型告别躺平刷分
为了打破这种「高分幻觉」,来自北京航空航天大学的研究团队提出了一种全新的基准构建哲学 ——双重扩展(Dual Scaling),并基于此构建了端到端的自动化框架Code2Bench。该研究旨在为代码大模型的评估,建立一个更动态、更严苛、也更具诊断性的新范式。
一些您可能无法访问的结果已被隐去。
显示无法访问的结果
今日热点
Delivers SOTU address
Guthrie offers $1M reward
Estate reaches settlement
Panama seizes key ports
Brand pleads not guilty
Coast Guard opens probe
Martin Short's daughter dies
Demands full military access
Postal Service can’t be sued
2 MO deputies fatally shot
On Democrats' SOTU boycott
Trump admin sues NJ
Rachel Reid delays release
UK fines Reddit with $20M
Canada OKs Gulfstream jets
Louvre director resigns
States sue Trump admin
Plans to exit bankruptcy
Reviewing Paramount’s new bid
Salve Regina student dies
Seizes third oil tanker
Tariffs take effect at 10%
RU investigates Pavel Durov
Allowed data sharing w/ ICE
House rejects air safety bill
To invest in AI data center
WA state stabbing attack
Waymo expands robotaxis
NH resident charged
Justice Department sues UCLA
US men’s hockey team visits WH
反馈