JittorLLMs部署的chatGLM怎么没反应啊

root@ser112992618210:/home/glm# python3 cli_demo.py chatglm
[i 0605 11:04:37.956964 68 compiler.py:955] Jittor(1.3.7.16) src: /usr/local/lib/python3.10/dist-packages/jittor
[i 0605 11:04:37.960263 68 compiler.py:956] g++ at /usr/bin/g++(11.3.0)
[i 0605 11:04:37.960328 68 compiler.py:957] cache_path: /root/.cache/jittor/jt1.3.7/g++11.3.0/py3.10.6/Linux-5.15.0-7xbd/IntelRXeonRCPUxe4/default
[i 0605 11:04:37.968498 68 __init__.py:411] Found addr2line(2.38) at /usr/bin/addr2line.
[i 0605 11:04:38.250442 68 __init__.py:227] Total mem: 11.68GB, using 3 procs for compiling.
Compiling jittor_core(150/150) used: 6.585s eta: 0.000s
[i 0605 11:04:45.264072 68 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0605 11:04:45.264524 68 swap.cc:29] Load cpu_mem_limit: 8000000000
[i 0605 11:04:45.264550 68 swap.cc:30] Load device_mem_limit: 8000000000
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|██████████████████████████████████████████| 8/8 [00:44<00:00, 5.62s/it]
用户输入:你好~
然后就没下文了,她连妈妈生的都回答不出来awa

可能是还在处理,我打了一个你是谁,然后它内存占用一直在93%(16GB)

这个问题解决了么,我试了 cli_demo.py、web_demo.py、api.py 都没反应,是怎么回事呢,发送请求后,观察CPU的利用率也不高,