wandb进行超参数搜索卡住

2026-04-29 09:092阅读0评论SEO基础
  • 内容介绍
  • 文章标签
  • 相关推荐
问题描述:

如题,楼主在使用wandb进行sweep的时候,设置搜索为bayes,搜到一半卡住,一直不进行下一步,多次重复均这样;然后设置搜索为grid的时候则可以顺利搜完…不知道有没有佬友遇到过相同的问题?

网友解答:
--【壹】--:

你去查一下,.log 呢 ,对应 yaml 配置看看。你这描述,也不能直接判断问题啊


--【贰】--:

就是一直卡在

2026-04-27 11:03:03,395 - wandb.wandb_agent - INFO - Cleaning up finished run: no647jty 2026-04-27 11:03:09,245 - wandb.wandb_agent - INFO - Running runs: []

然后wandb/debug.log

2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Current SDK version is 0.25.1 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Configure stats pid to 43106 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Loading settings from environment variables 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():717] Logging user logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug.log 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug-internal.log 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():844] calling init triggers 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():849] wandb.init called with sweep_config: {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005} config: {'epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'hidden_dim': 32, '_wandb': {}} 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():892] starting backend 2026-04-27 11:02:55,172 INFO MainThread:43106 [wandb_init.py:init():895] sending inform_init request 2026-04-27 11:02:55,203 INFO MainThread:43106 [wandb_init.py:init():903] backend started and connected 2026-04-27 11:02:55,204 INFO MainThread:43106 [wandb_run.py:_config_callback():1403] config_cb None None {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005} 2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():973] updated telemetry 2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout 2026-04-27 11:02:57,420 INFO MainThread:43106 [wandb_init.py:init():1042] starting run threads in backend 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_console_start():2524] atexit reg 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2373] redirect: wrap_raw 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2442] Wrapping output streams. 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2465] Redirects installed. 2026-04-27 11:02:57,480 INFO MainThread:43106 [wandb_init.py:init():1082] run started, returning control to user process 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_finish():2291] finishing run jiangtjerry/uncategorized/no647jty 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_atexit_cleanup():2490] got exitcode: 0 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2472] restore 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2478] restore done 2026-04-27 11:03:00,998 INFO MainThread:43106 [wandb_run.py:_footer_sync_info():3868] logging synced files

sweep.yaml里:

program: train.py method: bayes metric: name: val_loss goal: minimize parameters: learning_rate: values: [0.0005, 0.001, 0.005] hidden_dim: values: [16, 32, 64] batch_size: values: [16, 32, 64]


--【叁】--:

可以试试 SwanLab,搜索逻辑用 Optuna 或脚本跑,SwanLab 负责记录每组超参数、指标和日志,后面就能在面板里统一对比和排查了。

标签:人工智能
问题描述:

如题,楼主在使用wandb进行sweep的时候,设置搜索为bayes,搜到一半卡住,一直不进行下一步,多次重复均这样;然后设置搜索为grid的时候则可以顺利搜完…不知道有没有佬友遇到过相同的问题?

网友解答:
--【壹】--:

你去查一下,.log 呢 ,对应 yaml 配置看看。你这描述,也不能直接判断问题啊


--【贰】--:

就是一直卡在

2026-04-27 11:03:03,395 - wandb.wandb_agent - INFO - Cleaning up finished run: no647jty 2026-04-27 11:03:09,245 - wandb.wandb_agent - INFO - Running runs: []

然后wandb/debug.log

2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Current SDK version is 0.25.1 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Configure stats pid to 43106 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_setup.py:_flush():81] Loading settings from environment variables 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():717] Logging user logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug.log 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to ~/projects/test/wandb/run-20260427_110254-no647jty/logs/debug-internal.log 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():844] calling init triggers 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():849] wandb.init called with sweep_config: {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005} config: {'epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'hidden_dim': 32, '_wandb': {}} 2026-04-27 11:02:54,937 INFO MainThread:43106 [wandb_init.py:init():892] starting backend 2026-04-27 11:02:55,172 INFO MainThread:43106 [wandb_init.py:init():895] sending inform_init request 2026-04-27 11:02:55,203 INFO MainThread:43106 [wandb_init.py:init():903] backend started and connected 2026-04-27 11:02:55,204 INFO MainThread:43106 [wandb_run.py:_config_callback():1403] config_cb None None {'batch_size': 64, 'hidden_dim': 16, 'learning_rate': 0.005} 2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():973] updated telemetry 2026-04-27 11:02:55,205 INFO MainThread:43106 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout 2026-04-27 11:02:57,420 INFO MainThread:43106 [wandb_init.py:init():1042] starting run threads in backend 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_console_start():2524] atexit reg 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2373] redirect: wrap_raw 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2442] Wrapping output streams. 2026-04-27 11:02:57,479 INFO MainThread:43106 [wandb_run.py:_redirect():2465] Redirects installed. 2026-04-27 11:02:57,480 INFO MainThread:43106 [wandb_init.py:init():1082] run started, returning control to user process 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_finish():2291] finishing run jiangtjerry/uncategorized/no647jty 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_atexit_cleanup():2490] got exitcode: 0 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2472] restore 2026-04-27 11:02:57,752 INFO MainThread:43106 [wandb_run.py:_restore():2478] restore done 2026-04-27 11:03:00,998 INFO MainThread:43106 [wandb_run.py:_footer_sync_info():3868] logging synced files

sweep.yaml里:

program: train.py method: bayes metric: name: val_loss goal: minimize parameters: learning_rate: values: [0.0005, 0.001, 0.005] hidden_dim: values: [16, 32, 64] batch_size: values: [16, 32, 64]


--【叁】--:

可以试试 SwanLab,搜索逻辑用 Optuna 或脚本跑,SwanLab 负责记录每组超参数、指标和日志,后面就能在面板里统一对比和排查了。

标签:人工智能