What’s the best way to monitor and log which processes are responsible for high system load throughout the day? Tools like top and htop only provide immediate values, but I’m looking for a solution that offers historical data to identify the main culprits over time.

@sysadmin

#sysadmin #linux #server

  • mosiacmango@lemm.ee
    link
    fedilink
    arrow-up
    18
    arrow-down
    2
    ·
    edit-2
    3 months ago

    Netdata is excellent, simple and I believe FOSS. Just install locally and it should start logging pretty much everything.

    • Onno (VK6FLAB)
      link
      fedilink
      arrow-up
      14
      arrow-down
      8
      ·
      3 months ago

      Clicked the link, started reading … closed the window when I read “Netdata also incorporates A.I. insights for all monitored data”.

      • gravitas_deficiency@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        3 months ago

        Eesh. Yeah, that’s a nope from me, dawg.

        Actually, it’s all self-hosted. Granted, I haven’t looked at the code in detail, but building NNs to help efficiently detect and capture stuff is actually a very appropriate use of ML. This project looks kinda cool.

        • Onno (VK6FLAB)
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          3 months ago

          Machine Learning might be marketed as “all fine and dandy”, but I’m not planning on running a monitor training system loose on my production server under any circumstances.

          Not to mention that for it to be useful I’d have to give it at least a year of logs, which is both impossible and pointless, since the system running a year ago is not remotely the same as the one running today, even if not a single piece of our own code changed, which of course it did, the OS, applications and processes have been continually updated by system updates and security patches.

          So, no.

          • gravitas_deficiency@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            2
            ·
            3 months ago

            That’s why I put in the caveat about looking at the code. If you can’t grok what’s going on, that’s fine, but someone who does get it and can comfortably assert that no sketchy “phone home” shit is going on can and should use stuff like this, if they’re so inclined.

      • jimmy90@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        3 months ago

        this limited scope ML trained analysis is actually where “AI” excels, e.g. “computer vision” in specific medical scenarios

        • Onno (VK6FLAB)
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          3 months ago

          If the training data is available, yes, in this case, no chance.

          • jimmy90@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            3 months ago

            you don’t think they could get training data from friendly customers using their service?

    • Brewchin@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 months ago

      I run this in a Docker container on my home network without connecting it to their cloud platform (despite their - increasingly strident, it feels - “encouragements” to do so). It’s very powerful, and the majority of low level configuration is done via text files. But 99% of it is automatic.

      The UI is unique. It’s a single, long and scrollable page, which may be an issue for some.

      There are other tools out there, too. I previously used one that integrates Grafana, Prometheus and Node Exporter, which is more complex to set up and configure.

  • rowinxavier@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    3 months ago

    I did a whole stack of servers using SNMP based monitoring years ago and it was amazing. I could see loads, memory stats, NIC utilisation, disk space, and all sorts of other things. I tried Cacti and Icinga and settled on the latter but they are all fairly similar. Once you are generating the data you van do whatever you like with it, so monitoring load attributable to which actual executable is definitely manageable. It is also handy for getting notifications for something being down, losing stability, or just being out if whack.

  • daqu@feddit.org
    link
    fedilink
    arrow-up
    5
    ·
    3 months ago

    In my time we used sar. I feel old when reading about all your new tools I never heard of.

  • j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 months ago

    Look through RHEL stuff. I’m not sure if Tuna has exactly what you’re looking for, but it is the tool for detailed analysis of processes on logical cores, CPU set isolation and monitoring. RHEL has tools for everything in this area, and most are available in any other distro.

  • Oisteink@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    2 months ago

    I like zabbix. It can monitor what ever i like, using snmp, ipmi, rest apis or its own agent.

    I have a team member insisting on using netdata, but outside of the nice dashboard it doesn’t provide anything. It is local only, and setting up alarms is a pain. And tbh it nags more than canonical stuff