jittor的自动交换技术使用出问题

cheng · 2024 年4 月 26 日 05:14

如果您使用 Jittor 时遇到了安装问题、运行错误、输出错误等问题，您可以在 “问答” 区提问。
为了更好地定位您的问题，建议包含以下内容：

错误信息、
[i 0426 13:14:44.690761 96 cuda_device_allocator.cc:30]
=== display_memory_info ===
total_cpu_ram: 31.2GB total_device_ram: 11.71GB
hold_vars: 338 lived_vars: 3137 lived_ops: 3671
name: sfrl is_device: 1 used: 10.08GB(93.7%) unused: 699.3MB(6.34%) ULB: 46.08MB ULBO: 170MB total: 10.76GB
name: sfrl is_device: 1 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: sfrl is_device: 0 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: sfrl is_device: 0 used: 681KB(66.5%) unused: 343KB(33.5%) total: 1MB
name: sfrl is_device: 0 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: temp is_device: 0 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: temp is_device: 1 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
cpu&gpu: 10.77GB gpu: 10.76GB cpu: 1MB
free: cpu(2.286GB) gpu(190.4MB)
swap: total( 0 B) last( 0 B)
0%| | 0/3073 [00:00<?, ?it/s]
Traceback (most recent call last):
File “/home/cheng/try_jt/JCLIP/baseline.py”, line 68, in
’ '.join([str(p.item()) for p in top_labels]) + ‘\n’)
File “/home/cheng/try_jt/JCLIP/baseline.py”, line 68, in
’ '.join([str(p.item()) for p in top_labels]) + ‘\n’)
RuntimeError: [f 0426 13:14:44.690848 96 executor.cc:686]
Execute fused operator(1290/2022) failed.
[JIT Source]: /home/cheng/.cache/jittor/jt1.3.9/g++11.4.0/py3.10.12/Linux-6.5.0-28x13/12thGenIntelRCxe1/ab52/master/cu12.2.140_sm_89/jit/__opkey0_broadcast_to__Tx_float32__DIM_3__BCAST_1__opkey1_broadcast_to__Tx_float32__DIM_3____hash_9d5a71d1062958a5_op.cc
[OP TYPE]: fused_op:( broadcast_to, broadcast_to, binary.multiply, reduce.add,)
[Input]: float32[2048,512,]transformer.resblocks.9.mlp.c_fc.weight, float32[28798,512,],
[Output]: float32[28798,2048,],
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3
[Reason]: [f 0426 13:14:44.690806 96 cuda_device_allocator.cc:31] Unable to alloc cuda device memory for size 236978176
Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:

export JT_SYNC=1
export trace_py_var=3
Process finished with exit code 1

Jittor 运行的完整 log（建议复制粘贴运行输出，如果可能请不要使用截图）、

复现此问题的代码或者描述、
import jittor as jt
from PIL import Image
import jclip as clip
import os
from tqdm import tqdm
import argparse
import jittor_utils.clean_cache as clc
jt.flags.use_cuda = 1
parser = argparse.ArgumentParser()
parser.add_argument(‘–split’, type=str, default=‘A’)
args = parser.parse_args()
model, preprocess = clip.load(“ViT-B-32.pkl”)
classes = open(‘Dataset/classes.txt’).read().splitlines()
new_classes = []
for c in classes:
c = c.split(’ ')[0]
if c.startswith(‘Animal’):
c = c[7:]
if c.startswith(‘Thu-dog’):
c = c[8:]
if c.startswith(‘Caltech-101’):
c = c[12:]
if c.startswith(‘Food-101’):
c = c[9:]
c = 'a photo of ’ + c
new_classes.append(c)
text = clip.tokenize(new_classes)
text_features = model.encode_text(text)
text_features /= text_features.norm(dim=-1, keepdim=True)
split = ‘TestSet’ + args.split
imgs_dir = ‘Dataset/’ + split
imgs = os.listdir(imgs_dir)
save_file = open(‘result.txt’, ‘w’)
clc.clean_swap()
preds = []
with jt.no_grad():
for img in tqdm(imgs):
img_path = os.path.join(imgs_dir, img)
image = Image.open(img_path)
image = preprocess(image).unsqueeze(0)
image_features = model.encode_image(image)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 *
image_features @ text_features.transpose(0, 1)).softmax(
dim=-1)
# top5 predictions
_, top_labels = text_probs[0].topk(5)
preds.append(top_labels)
# save top5 predictions to file
save_file.write(img + ’ ’ +
’ '.join([str(p.item()) for p in top_labels]) + ‘\n’)

您认为正确结果应当如何、
按照官方文档中自动交换技术章节的描述，已经添加了环境变量，但代码还是会分配超出限制的显存导致报错

其他必要信息
ubuntu22.04, GPU 4070super, 显卡驱动nvidia-driver-550

这有一个参考模板

plutoZZZZ · 2024 年10 月 14 日 15:01

您好，我最近也遇到了这个问题，请问您解决了吗？