G’day,

I’m on a 100/40Mbps HFC plan, and have an ongoing issue where the internet in general (browsing, file access, email, cloud hosted products, etc) will just grind to halt.

For example, I can be accessing our CMMS and suddenly a page load will take 30+ seconds to complete. Never times out, just takes forever. Or I’m using our accounting software which syncs remotely and saving an invoice or opening a purchase order will stall for a minute.

This behaviour goes on for maybe 5 minutes or so and then goes away again. It can occur once or twice in a 10 hour day at the office, or not at all, or sometimes half a dozen times in a one hour period.

  • Local network use is unaffected (for e.g. accessing SMB shares to a local server)
  • All PCs and laptops connected to the LAN are affected so its not PC-specific.
  • Ping is unaffected and hovers around 12ms to geographically close remote servers, with no packet loss or jitter.
  • Speedtests of any kind always return around 95/35Mbps at any time be it peak / off peak / when problem is occurring / when problem is not occurring
  • VOIP does not seem to be affected despite being on the same network and I can talk on the phone while the internet is otherwise wading its way through treacle.
  • Happens with my current ISP (Leaptel), but also happened the previous ISP (Aussie Broadband) who are 100% completely different companies and I believe use completely different peering/routing/backhaul/etc.
  • DNS seems irrelevant and occurs using either the ISP DNS, Cloudflare, Google, or Quad9
  • Some websites like Facebook and Google work, but other websites like Lemmy (any instance), Reddit, my CMMS, various wholesaler sites hosted both in AU and worldwide, are affected.

Are there any steps I can take to try and identify what causes this random delay? Its just enough to be really frustrating, especially when you’re trying to look up something while on the phone and have to be like “so yeah hows the wife? hows the kids? hows the…dog? … pet bird doing anything interesting?” as you wait for a damn page to load. I need fast internet so I dont need to make small talk dammit.

PCs are all on cat5e or cat6 (depending on when the cabling was run), to a Ubiquiti Dream Machine SE which is connected via cat6 to the NBN HFC modem.

  • rudyharrelson
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    7 days ago

    I’ll share my input, although it’s primarily speculation and a smidge of deductive reasoning.

    Given these three particular pieces of information:

    • Local network use is unaffected (for e.g. accessing SMB shares to a local server)
    • Happens with my current ISP (Leaptel), but also happened the previous ISP (Aussie Broadband)
    • Ping is unaffected and hovers around 12ms to geographically close remote servers, with no packet loss or jitter.

    My first instinct is the issue may be upstream (non-local) network congestion. Since it appears that connections are slowing to a crawl rather than dropping packets. Ping requests don’t seem to suffer, but they’re a lot smaller than loading content via CMMS, Reddit, etc. You mentioned it could happen twice or more in a 10 hour shift, or sometimes not at all; network congestion being highly variable could explain this.

    Are you in a remote area? If so, there may not be much nearby infrastructure (routers) to handle the big spikes in traffic when everyone in the immediate area clocks in to work at 9am, or gets back from lunch around 1pm, etc. If that’s the case, the local routers would get overwhelmed regularly by congestion and packet delivery times would suffer. This could also happen in more densely populated areas, depending on what the local infrastructure looks like.

    Though I’m not entirely sure how to explain speed tests not suffering if congestion is the issue; unless the particular routes to the geographically-close test servers aren’t congested (because large numbers of people are trying to connect to real services, not the speed tests, during these congestion times).

    The fact that some live services like Google & Facebook load while others like Reddit and Lemmy do not could be explained by the difference in those services’ respective high-availability (HA) solutions. Facebook and Google don’t typically drop below 99.95%-ish uptime because they scale their server infrastructure very aggressively to meet demand. But even huge services like Reddit have considerably more downtime than Facebook or Google (Reddit seems to have major outages several times a year, while Google and Facebook do not). Some upstream services having more servers to handle more requests more quickly could account for the inconsistent ability to load websites during this congestion.

    I’m not sure the best way to test this hypothesis, though. Given how much troubleshooting and information gathering you’ve already done, this is a tricky one.