wandb进行超参数搜索卡住
- 内容介绍
- 文章标签
- 相关推荐
如题,楼主在使用wandb进行sweep的时候,设置搜索为bayes,搜到一半卡住,一直不进行下一步,多次重复均这样;然后设置搜索为grid的时候则可以顺利搜完…不知道有没有佬友遇到过相同的问题?
--【壹】--:
你去查一下,.log 呢 ,对应 yaml 配置看看。你这描述,也不能直接判断问题啊
--【贰】--:
就是一直卡在
2026-04-27 11:03:03,395 - wandb.wandb_agent - INFO - Cleaning up finished run: no647jty
2026-04-27 11:03:09,245 - wandb.wandb_agent - INFO - Running runs: []
然后wandb/debug.log:
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Current SDK version is 0.25.1
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Configure stats pid to 43106
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Loading settings from environment variables
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():717] Logging user logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug.log
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug-internal.log
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():844] calling init triggers
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():849] wandb.init called with sweep_config: {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005}
config: {'epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'hidden_dim': 32, '_wandb': {}}
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():892] starting backend
2026-04-27 11:02:55,172 INFO MainThread:43106 [wandb_init.py:init():895] sending inform_init request
2026-04-27 11:02:55,203 INFO MainThread:43106 [wandb_init.py:init():903] backend started and connected
2026-04-27 11:02:55,204 INFO MainThread:43106 [wandb_run.py:_config_callback():1403] config_cb None None {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005}
2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():973] updated telemetry
2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
2026-04-27 11:02:57,420 INFO MainThread:43106 [wandb_init.py:init():1042] starting run threads in backend
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_console_start():2524] atexit reg
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2373] redirect: wrap_raw
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2442] Wrapping output streams.
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2465] Redirects installed.
2026-04-27 11:02:57,480 INFO MainThread:43106 [wandb_init.py:init():1082] run started, returning control to user process
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_finish():2291] finishing run jiangtjerry/uncategorized/no647jty
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_atexit_cleanup():2490] got exitcode: 0
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2472] restore
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2478] restore done
2026-04-27 11:03:00,998 INFO MainThread:43106 [wandb_run.py:_footer_sync_info():3868] logging synced files
在sweep.yaml里:
program: train.py
method: bayes
metric:
name: val_loss
goal: minimize
parameters:
learning_rate:
values: [0.0005, 0.001, 0.005]
hidden_dim:
values: [16, 32, 64]
batch_size:
values: [16, 32, 64]
--【叁】--:
可以试试 SwanLab,搜索逻辑用 Optuna 或脚本跑,SwanLab 负责记录每组超参数、指标和日志,后面就能在面板里统一对比和排查了。
如题,楼主在使用wandb进行sweep的时候,设置搜索为bayes,搜到一半卡住,一直不进行下一步,多次重复均这样;然后设置搜索为grid的时候则可以顺利搜完…不知道有没有佬友遇到过相同的问题?
--【壹】--:
你去查一下,.log 呢 ,对应 yaml 配置看看。你这描述,也不能直接判断问题啊
--【贰】--:
就是一直卡在
2026-04-27 11:03:03,395 - wandb.wandb_agent - INFO - Cleaning up finished run: no647jty
2026-04-27 11:03:09,245 - wandb.wandb_agent - INFO - Running runs: []
然后wandb/debug.log:
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Current SDK version is 0.25.1
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Configure stats pid to 43106
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Loading settings from environment variables
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():717] Logging user logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug.log
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug-internal.log
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():844] calling init triggers
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():849] wandb.init called with sweep_config: {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005}
config: {'epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'hidden_dim': 32, '_wandb': {}}
2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():892] starting backend
2026-04-27 11:02:55,172 INFO MainThread:43106 [wandb_init.py:init():895] sending inform_init request
2026-04-27 11:02:55,203 INFO MainThread:43106 [wandb_init.py:init():903] backend started and connected
2026-04-27 11:02:55,204 INFO MainThread:43106 [wandb_run.py:_config_callback():1403] config_cb None None {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005}
2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():973] updated telemetry
2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
2026-04-27 11:02:57,420 INFO MainThread:43106 [wandb_init.py:init():1042] starting run threads in backend
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_console_start():2524] atexit reg
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2373] redirect: wrap_raw
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2442] Wrapping output streams.
2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2465] Redirects installed.
2026-04-27 11:02:57,480 INFO MainThread:43106 [wandb_init.py:init():1082] run started, returning control to user process
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_finish():2291] finishing run jiangtjerry/uncategorized/no647jty
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_atexit_cleanup():2490] got exitcode: 0
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2472] restore
2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2478] restore done
2026-04-27 11:03:00,998 INFO MainThread:43106 [wandb_run.py:_footer_sync_info():3868] logging synced files
在sweep.yaml里:
program: train.py
method: bayes
metric:
name: val_loss
goal: minimize
parameters:
learning_rate:
values: [0.0005, 0.001, 0.005]
hidden_dim:
values: [16, 32, 64]
batch_size:
values: [16, 32, 64]
--【叁】--:
可以试试 SwanLab,搜索逻辑用 Optuna 或脚本跑,SwanLab 负责记录每组超参数、指标和日志,后面就能在面板里统一对比和排查了。

