While investigating whether my AMD is somewhat usable for running LLMs I this is what I found.
Preparations
Installation of amdgpu
driver and ROCm
is explained here.
Hardware/device information
During reboot the amdgpu
driver logs some information about the available amount of graphics memory: the reserved VRAM memory and the GTT (graphics translation table) memory that can be used both as graphics memory (if needed/allocated) or system memory.
Machine #1 with a 680M APU reports:
linux # dmesg | grep amdgpu
<...>
[ 3.845996] [drm] amdgpu: 512M of VRAM memory ready
[ 3.846004] [drm] amdgpu: 64021M of GTT memory ready.
<...>
Machine #2 with a 780M APU:
linux # dmesg | grep amdgpu
<...>
[ 2.458868] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.458871] [drm] amdgpu: 13945M of GTT memory ready.
<...>
Increase reserved VRAM
According to some postings GTT memory can be modified by amdgpu
module’s parameter gttsize
:
linux # modinfo amdgpu | grep gttsize
parm: gttsize:Size of the GTT userspace domain in megabytes (-1 = auto) (int)
So I booted the system with extra kernel option "amdgpu.gttsize=16384"
(size in MB) to allocate 16GB of VRAM. That did the trick, well – at least kind of:
linux # dmesg
<...>
[ 2.507767] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.507772] amdgpu 0000:c6:00.0: amdgpu: [drm] Configuring gttsize via module parameter is deprecated, please use ttm.pages_limit
[ 2.507777] amdgpu 0000:c6:00.0: amdgpu: [drm] GTT size has been set as 17179869184 but TTM size has been set as 14622654464, this is unusual
[ 2.507781] [drm] amdgpu: 16384M of GTT memory ready.
<...>
Ok, so this value should now be set differently (btw: kernel version is 6.8.0), let’s check out how by getting information about the parameters of the ttm
("TTM"
stands for "translation table maps"
) kernel module:
linux # modinfo ttm
<...>
description: TTM memory manager subsystem (for DRM device)
<...>
parm: page_pool_size:Number of pages in the WC/UC/DMA pool (ulong)
parm: pages_limit:Limit for the allocated pages (ulong)
parm: dma32_pages_limit:Limit for the allocated DMA32 pages (ulong)
While trying to get the current values used by that module I found that the ttm
module isn’t even loaded, instead I found an active module amdttm
which seems to ship with the amdgpu drivers (and seems to replace the standard ttm
module):
linux # modinfo amdttm
<...>
description: TTM memory manager subsystem (for DRM device)
<...>
parm: page_pool_size:Number of pages in the WC/UC/DMA pool (ulong)
parm: pages_limit:Limit for the allocated pages (ulong)
parm: dma32_pages_limit:Limit for the allocated DMA32 pages (ulong)
Let’s see what values were used by the module on the 780M machine that complained about the memory mismatch:
linux # cat /sys/module/amdttm/parameters/page_pool_size
3569984
linux # cat /sys/module/amdttm/parameters/pages_limit
3569984
linux # cat /sys/module/amdttm/parameters/dma32_pages_limit
524288
With a default page size of 4k (4096 bytes) this is:
3569984 * 4096 = 14622654464
which is exactly the amount of memory mentioned during boot.
So next try is to set amdttm.pages_limit
and amdttm.page_pool_size
to the desired value (which is specified in pages with a size of 4k), so for 16 GB that means
16 * 1024 * 1024 * 1024 / 4096 = 4194304
Booting up machine #2 (780M) with these options results in 16GB of GTT memory like before, but this time without complaints by the kernel:
linux # dmesg
<...>
[ 2.431176] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.431180] [drm] amdgpu: 16384M of GTT memory ready.
<...>
System tools
Let’s first have a look at our hardware/software stack:
linux # rocm-smi
==== ROCm System Management Interface ====
==== Concise Info ====
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Socket) (Mem, Compute, ID)
=================================================
0 1 0x1681, 50563 43.0°C 8.168W N/A, N/A, 0 N/A 2400Mhz 0% auto N/A 4% 0%
=================================================
==== End of ROCm SMI Log ====
linux # rocm-smi --showproductname
==== ROCm System Management Interface ====
==== Product Info ====
GPU[0] : Card Series: AMD Radeon Graphics
GPU[0] : Card Model: 0x1681
GPU[0] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0] : Card SKU: REMBRANDT
GPU[0] : Subsystem ID: -0x776b
GPU[0] : Device Rev: 0x0a
GPU[0] : Node ID: 1
GPU[0] : GUID: 50563
GPU[0] : GFX Version: gfx1035
==== End of ROCm SMI Log ====
linux # rocm-smi --showhw
==== ROCm System Management Interface ====
==== Concise Hardware Info ====
GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID
0 1 0x1681 50563 gfx1035 N/A N/A N/A 113-REMBRANDT-X37 0000:E7:00.0 0
=================================================
==== End of ROCm SMI Log ====
linux # rocm-smi --showdriverversion
==== ROCm System Management Interface ====
==== Version of System Component ====
Driver version: 6.12.12
=================================================
==== End of ROCm SMI Log ====
linux # rocm-smi --showfwinfo
==== ROCm System Management Interface ====
==== Firmware Information ====
GPU[0] : ASD firmware version: 0x210000eb
GPU[0] : CE firmware version: 37
GPU[0] : ME firmware version: 64
GPU[0] : MEC firmware version: 122
GPU[0] : MEC2 firmware version: 122
GPU[0] : PFP firmware version: 104
GPU[0] : RLC firmware version: 83
GPU[0] : RLC SRLC firmware version: 1
GPU[0] : RLC SRLG firmware version: 1
GPU[0] : RLC SRLS firmware version: 1
GPU[0] : SDMA firmware version: 47
GPU[0] : SMC firmware version: 04.69.63.105
GPU[0] : VCN firmware version: 0x04121003
==== End of ROCm SMI Log ====
Or just use "rocm-smi -a"
to show all available information.
Running vllm with ROCm support on 680M/780M
While official ROCm releases do not support these cards, there’s an inofficial project adding support for those cards!
And even better: docker images containing a pre-build vllm are also available!
So for now all we have to do is choose the right docker image according to the graphics core engine:
RDNA1/2 image for 680M:
linux # docker pull lamikr/rocm_sdk_builder:612_01_rdna1_rdna2
linux # docker run -it --device=/dev/kfd --device=/dev/dri -p 8000:8000 docker.io/lamikr/rocm_sdk_builder:612_01_rdna1_rdna2 bash
RDNA3 image for 780M:
linux # docker pull lamikr/rocm_sdk_builder:612_01_rdna3
linux # docker run -it --device=/dev/kfd --device=/dev/dri -p 8000:8000 docker.io/lamikr/rocm_sdk_builder:612_01_rdna3 bash
Now let’s start vllm with a standard module:
linux # vllm serve <your model here> --api-key token-abc123
Wait for the model preparation … and hope for the best.
If startup is successful you can now access the model using the openai API URL: http://<your-ip>:8000/v1
.
Conclusion
While some simple LLMs worked as described above, the majority either took too long to even start up, didn’t start up at all (with a random bunch of error messages) or even crashed the system (ok, only the Wayland-server, but bad enough).
So in the end it still looks like AMD GPUs (especially the cheap APU ones) are not supported by official ROCm release for a reason (but maybe the same applies for the bigger GPUs … I don’t know …)