• dosse91
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    4 months ago

    I may be partially responsible for this lazy ass implementation.

    3 months ago I was playing around with stable diffusion a lot and because I sleep in the same room where my PC is, I used to lower the TDP of the GPU during the night to 150w to keep it quiet. One day while SD was running, I lowered the TDP in LACT and pressed Apply but instead of getting quieter, the fans ramped up and I was shocked seeing that the card was in fact pulling 420w instead of its rated 293w (6900xt).

    I tracked down the issue to the driver incorrectly applying the power limit, basically if you set a TDP that’s too low for the current power state, the driver would disable the power limit entirely until the card entered a lower power state, after which, your new TDP would be correctly applied.

    Running a modern GPU without power limits is bad and potentially dangerous for everything involved: the GPU, the VRMs, even the power supply cables may melt as we’ve seen with nVidia cards. So I reported the issue immediately to the AMDGPU developers (my issue is linked in the article).

    They quickly came up with a fix, which I tested, which wouldn’t allow you to set a TDP lower than the lowest valid TDP for the highest power state. This gets the job done but it’s a kludge more than a fix, ideally the driver should realize that the new TDP is too low for the current power state and switch to a lower power state, and I don’t know why AMD implemented such a shitty solution in their official kernel driver.

    • swab148@startrek.website
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 months ago

      Alright everyone, time to break out the pitchforks and storm this guy’s house lol

      But seriously, this is 100% on AMD, don’t beat yourself up for their laziness.