Contents
- Active State Power Management (ASPM)
- Let’s turn it on
- The Result
- Bonus Round: ASPM-Tool

Remember when I got a new home server and found all of the temperatures to monitor? I was pretty uncomfortable with the SSD controller sitting at 60ºC all the time. I put a heatsink on it, and it fell to 48º, but I still felt like it was always high.

I assumed it was some kind of PCIe / NVMe power saving feature not working. I’d prefer if it chilled out when idle, which is most of it’s life. Let’s see what we can do about it…

Active State Power Management (ASPM)

When it comes to power management on PCIe devices, the big one is ASPM. It’s actually a feature to let the PCIe link enter an idle state, which suggests to the device to enter a lower power state too.

Check it with

prtconf -d -v | less
...
pci144d,a801 (pciex144d,a808) [Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983], instance #0
...
name='pcie-aspm-state' type=string items=1
		value='disabled'
name='pcie-aspm-support' type=string items=1
		value='l1'
...

In the output, we can barely read through the 12 tab indents that the SSD, whilst supporting L1 ASPM mode, has it disabled. (You’re gonna want to view the output with less, it’s gigantic.)

Let’s turn it on

(Check the Bonus Round if you’re trying to turn on ASPM yourself, read on to follow the journey.)

ASPM can be set by the BIOS, or it can hand control to the OS, usually doing the latter by default. So how do we enable ASPM on OmniOS? Unfortunately I couldn’t find any ‘proper’ method. It would be cool if there were options for such things in /etc/power.conf like there is for the CPU and HDD power saving features, but no. Illumos has been mostly focused on server situations, where there is a culture of ’the Max Power way’.

Lucky for us we have at least got pciutils available to install. setpci lets you read and write PCI device registers. The values for different ASPM settings are well known:

Hex  Binary  Setting      Meaning
---------------------------------------------------------------
0    0b00    L0 only      ASPM disabled, L0 is normal link mode
1    0b01    L0s only     Stay in L0 mode but turn off the downlink to let the device sleep
2    0b10    L1 only      Turn off the PCIe link completely for even more power saving
3    0b11    L1 and L0s   Can enter both lower power modes

But what’s the register to write these two bytes to? Uhhh it’s complicated, you gotta follow a pointer then scan around for the right value then add 0x10. The best info about this is from the brave souls trying to get WiFi working well on Linux. Since it’s slightly complicated, shoutout to z8 who not only wrote an excellent blog on this, but wrote some tools to turn on the feature. aspm.py is a rewrite of Eivs’ enable-aspm.sh. I went for z8’s python version since I can read the python better, and it’s full of very assuring print statements that tell you what it’s up to at every step. Both make use of setpci so they’ll run fine on Illumos, for once.

The script needs you to specify the device with {bus}:{slot}.{func}, along with the root_complex, which is the other end of the PCIe link. Since this is a link control feature, both ends need to have the same setting applied for it to actually happen. The guides I’ve linked to above just say to ‘get it from lspci -t’. Maybe I’m an idiot, but that output was very unclear to me. Instead we can use lspci -PP 🤣:

lspci -PP
...
00:01.1/01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
...

Now we can easily see the root_complex 00:01.1, followed by a slash and the endpoint 01:00.0. Punch this values into aspm.py and give it a run

root_complex = "00:01.1"
endpoint = "01:00.0"
value_to_set = ASPM.ASPM_L1_ONLY
python3 aspm.py
...
Position of byte to patch: 0x80
Byte is set to 0x40
-> ASPM_DISABLED
Value doesn't match the one we want, setting it!
Byte is now set to 0x42
-> ASPM_L1_ONLY

The Result

Now the SSD is sitting at 38º, a 10º drop! I’ll take that as evidence that it’s actually worked. Idle power draw was sitting at 32W previously, though it seems to have only dropped by a single Watt. If we go ahead and enable all of the ASPM settings for every device that supports it, we can get system down to 28W, a 4W drop! Pretty sweet, considering the four hard drives are using at least 20W of that constantly spinning.

Course some devices that technically support ASPM just don’t work well when enabled. Even just the extra latency from waiting for the device to wake up can cause issues. My SSD claims it can wake up in less than 64μs, which I can think I can live with. The benefit of running a consumer mobile chipset as a server is that this kind of stuff actually works pretty well. If it causes issues, it’s pretty easy to switch back off. So far on my setup, everything has been working great.

Bonus Round: ASPM-Tool

Whilst aspm.py worked wonderfully, I quickly got sick of running multiple commands and copy/pasting details to configure all of my devices. So I created a CLI wrapper around the functions, aspm-tool.py. It will list the name, address, ASPM capability and current setting:

./aspm-tool.py -d 01:00.0
Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
        Address: 00:01.1/01:00.0
        Capable: L1
        Current: L1

It can also apply the correct settings to all devices with a single command:

./aspm-tool.py -s auto

Find it on Github today.

A million thanks to z8 for mapping out so much of this rabbit hole, I can’t stress how much their pain and suffering working all of this out made it so easy and enjoyable for me to make my server work better.