cuda报错

terminate called after throwing an instance of ‘std::runtime_error’
what(): [f 0412 01:27:41.788757 36 helper_cuda.h:128] CUDA error at /mnt/anaconda/envs/jdiffusion/lib/python3.9/site-packages/jittor/extern/cuda/cublas/src/cublas_wrapper.cc:21 code=1( CUBLAS_STATUS_NOT_INITIALIZED ) cublasCreate(&cublas_handle)
/mnt/lt/JDiffusion/examples/dreambooth/train_all.sh: 第 44 行: 2913072 已放弃 (核心已转储) CUDA_VISIBLE_DEVICES=0 python train.py --pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1 --instancetyle/15/images --output_dir=style/style_15 --instance_prompt=style_15 --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=1e-4 --lr_scheduler=constant --lr_warmup_steps=0 --max_train_steps=500 --seed=0
这个如何解决?

先尝试运行python -m jittor.test.test_cuda看是否正常,然后尝试运行JDiffusion库里面的example的test相关的py看diffusion相关环境是否运行正常,如果没问题的话,我推断可能是显存不够出现的报错,比赛2baseline需要20G左右的显存,不知是否满足

大佬,我不是很懂。我在4080上,16G显存,它爆了。但是我在4090上跑的时候,显存只用了13G左右。

峰值显存超过16G了

大佬,帮我看看我那个huggingface报错是什么意思,我解决不了。。。。 :sob:

Traceback (most recent call last):
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py”, line 304, in hf_raise_for_status
response.raise_for_status()
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/requests/models.py”, line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/style/style_0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/root/JDiffusion/examples/dreambooth/run_all.py”, line 11, in
pipe.load_lora_weights(f"style/style_{taskid}")
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/diffusers/loaders/lora.py”, line 110, in load_lora_weights
state_dict, network_alphas = self.lora_state_dict(pretrained_model_name_or_path_or_dict, **kwargs)
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py”, line 119, in _inner_fn
return fn(*args, **kwargs)
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/diffusers/loaders/lora.py”, line 264, in lora_state_dict
weight_name = cls._best_guess_weight_name(
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/diffusers/loaders/lora.py”, line 319, in _best_guess_weight_name
files_in_repo = model_info(pretrained_model_name_or_path_or_dict).siblings
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py”, line 119, in _inner_fn
return fn(*args, **kwargs)
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/huggingface_hub/hf_api.py”, line 2228, in model_info
hf_raise_for_status(r)
File “/root/miniconda3/envs/jdiffusion/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py”, line 352, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-66198cb3-252585cf049b0a9623633cc4;09e233f3-8037-462d-9d4e-db59f0ac603b)

Repository Not Found for url: https://huggingface.co/api/models/style/style_0.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.