English
全部
搜索
图片
视频
地图
资讯
更多
购物
航班
旅游
笔记本
Top stories
Sports
U.S.
Local
World
Science
Technology
Entertainment
Business
More
Politics
时间不限
过去 1 小时
过去 24 小时
过去 7 天
过去 30 天
最佳匹配
最新
新浪网
2月
稳定训练、数据高效,清华大学提出「流策略」强化学习新方法SAC Flow
本文介绍了一种用高数据效率强化学习算法 SAC 训练流策略的新方案,可以端到端优化真实的流策略,而无需采用替代目标或者策略蒸馏。SAC FLow 的核心思想是把流策略视作一个 residual RNN,再用 GRU 门控和 Transformer Decoder 两套速度参数化。SAC FLow 在 MuJoCo、OGBench ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果
今日热点
Comes out of retirement
Rourke launches GoFundMe
Denmark PM calls out Trump
Reveals rare health diagnosis
Broadway actor dies
Over 6.1K prisoners freed
Trump: US will run Venezuela
Wisconsin judge resigns
Carted off with leg injury
Agree to 3-yr extension?
Breaks NBA record
Sets Big Ten assists record
Clinch NFC’s No. 1 seed
2026 Critics Choice Awards
South Korean movie star dies
Massive blaze in Denver
California's open carry ban
Warriors' Green ejected again
Caribbean: Travelers stranded
'Super tusker' elephant dies
Faces drug charges
Greece airspace disrupted
Suspected IS site bombed
Iran’s leader on protests
Bar managers under probe
SK president visits China
Back in UK after crash
Nigeria village attack
To appear in US federal court
Parachutist crash lands
Arizona helicopter crash
Falcons fire coach Morris
NK tests hypersonic missiles
反馈