通过了jittor的test_example例子后,在test_cuda时出现了以下问题

在我通过了jittor的test_example例子后,在test_cuda时出现了以下问题。
报错的日志如下。
yhr@0415server:~/Downloads/root/gaugan$ python3.10 -m jittor.test.test_cuda
[i 0621 09:30:27.314555 56 compiler.py:955] Jittor(1.3.8.2) src: /home/yhr/.local/lib/python3.10/site-packages/jittor
[i 0621 09:30:27.316779 56 compiler.py:956] g++ at /usr/bin/g++(9.5.0)
[i 0621 09:30:27.316866 56 compiler.py:957] cache_path: /home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default
[i 0621 09:30:27.320335 56 __init__.py:411] Found nvcc(12.1.66) at /usr/local/cuda/bin/nvcc.
[i 0621 09:30:27.322762 56 __init__.py:411] Found addr2line(2.38) at /usr/bin/addr2line.
[i 0621 09:30:27.505912 56 compiler.py:1010] cuda key:cu12.1.66_sm_61
[i 0621 09:30:27.741637 56 __init__.py:227] Total mem: 125.61GB, using 16 procs for compiling.
[i 0621 09:30:27.884383 56 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0621 09:30:28.147805 56 init.cc:62] Found cuda archs: [61,]
[i 0621 09:30:30.979187 56 cuda_flags.cc:39] CUDA enabled.

Compiling Operators(1/1) used: 2.67s eta: 0s
.[i 0621 09:30:35.060328 56 cuda_flags.cc:39] CUDA enabled.

Compiling Operators(1/1) used: 3.28s eta: 0s
.[i 0621 09:30:38.340891 56 cuda_flags.cc:39] CUDA enabled.
/home/yhr/.local/lib/python3.10/site-packages/jittor/src/misc/cuda_atomic.h(138): error: no instance of overloaded function “atomicCAS” matches the argument list
argument types are: (unsigned short *, unsigned short, unsigned short)
old = atomicCAS(a_i, assume, int_mapper<__half>::to_int(b));
^

/home/yhr/.local/lib/python3.10/site-packages/jittor/src/misc/cuda_atomic.h(153): error: no instance of overloaded function “atomicCAS” matches the argument list
argument types are: (unsigned short *, unsigned short, unsigned short)
old = atomicCAS(a_i, assume, int_mapper<__half>::to_int(b));
^

2 errors detected in the compilation of “/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61/jit/__opkey0_array__T_int32__o_2__opkey1_binary__Tx_int32__Ty_int32__Tz_int32__OP_add__opkey2____hash_861eebc2e1f0c5c2_op.cc”.
E[i 0621 09:30:41.105088 56 cuda_flags.cc:39] CUDA enabled.
.s

ERROR: test_cuda_fused_op (main.TestCuda)

Traceback (most recent call last):
File “/home/yhr/.local/lib/python3.10/site-packages/jittor/test/test_cuda.py”, line 108, in test_cuda_fused_op
((a+a)*2).data
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.data)).

Types of your inputs are:
self = Var,

The function declarations are:
inline DataView data()

Failed reason:[f 0621 09:30:41.104216 56 parallel_compiler.cc:330] Error happend during compilation:
[Error] source file location:/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61/jit/__opkey0_array__T_int32__o_2__opkey1_binary__Tx_int32__Ty_int32__Tz_int32__OP_add__opkey2____hash_861eebc2e1f0c5c2_op.cc
Compile fused operator(0/1)failed:[Op(18:0:1:1:i0:o1:s0,array->19),Op(16:0:1:1:i2:o1:s0,binary.add->17),Op(22:0:1:1:i1:o1:s0,broadcast_to->23),Op(24:0:1:1:i2:o1:s0,binary.multiply->25),]

Reason: [f 0621 09:30:41.103841 56 log.cc:608] Check failed ret(256) == 0(0) Run cmd failed: “/usr/local/cuda/bin/nvcc” “/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61/jit/__opkey0_array__T_int32__o_2__opkey1_binary__Tx_int32__Ty_int32__Tz_int32__OP_add__opkey2____hash_861eebc2e1f0c5c2_op.cc” -std=c++14 -Xcompiler -fPIC -Xcompiler -march=native -Xcompiler -fdiagnostics-color=always -lstdc++ -ldl -shared -I"/home/yhr/.local/lib/python3.10/site-packages/jittor/src" -I/usr/include/python3.10 -I/usr/include/python3.10 -DHAS_CUDA -DIS_CUDA -I"/usr/local/cuda/include" -I"/home/yhr/.local/lib/python3.10/site-packages/jittor/extern/cuda/inc" -lcudart -L"/usr/local/cuda/lib64" -Xlinker -rpath=“/usr/local/cuda/lib64” -I"/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61" -L"/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61" -Xlinker -rpath=“/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61” -L"/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default" -Xlinker -rpath=“/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default” -l:“jit_utils_core.cpython-310-x86_64-linux-gnu”.so -l:“jittor_core.cpython-310-x86_64-linux-gnu”.so -x cu --cudart=shared -ccbin=“/usr/bin/g++” --use_fast_math -w -I"/home/yhr/.local/lib/python3.10/site-packages/jittor/extern/cuda/inc" -arch=compute_61 -code=sm_61 -o “/home/yhr/.cache/jittor/jt1.3.8/g++9.5.0/py3.10.6/Linux-5.15.0-7x2f/IntelRXeonRCPUx6c/default/cu12.1.66_sm_61/jit/__opkey0_array__T_int32__o_2__opkey1_binary__Tx_int32__Ty_int32__Tz_int32__OP_add__opkey2____hash_861eebc2e1f0c5c2_op.so”


Ran 5 tests in 11.584s

FAILED (errors=1, skipped=1)

我使用的版本如下:
Ubuntu 22.02
python 3.10
g++ 9.5
cuda 11.3 + cudnn 8.8.0 or cuda 12.1 + cudnn 8.9.1(最新)
我发现有类似的问题通过更新到最新cuda和cudnn解决,但是我仍然出现了上述问题。
恳请各位帮助我发现任何可能存在的问题,并告诉我可能解决该问题的方法