00:29 | sunweaver has left IRC (sunweaver!~sunweaver@fylgja.das-netzwerkteam.de, *.net *.split) | |
00:29 | wyre has left IRC (wyre!~wyre@user/wyre, *.net *.split) | |
00:29 | sunweaver has joined IRC (sunweaver!~sunweaver@fylgja.das-netzwerkteam.de) | |
00:29 | sunweaver is now away: not here ... | |
00:30 | wyre has joined IRC (wyre!~wyre@user/wyre) | |
00:30 | wyre is now away: Auto away at Wed Sep 6 00:25:20 2023 UTC | |
05:28 | woernie has joined IRC (woernie!~werner@p5ddedeef.dip0.t-ipconnect.de) | |
05:34 | wyre is back | |
06:33 | ricotz has joined IRC (ricotz!~ricotz@ubuntu/member/ricotz) | |
06:40 | wyre is now away: Auto away at Wed Sep 6 06:35:20 2023 UTC | |
08:41 | danboid has joined IRC (danboid!~danboid@remote.salford.ac.uk) | |
08:43 | <danboid> A few months ago, LTSP slowed to a crawl but I don't know why. It takes about 3 minutes to load Chrome under LTSP on an i7 with 32 GB RAM. Has anyone got any suggestions for troubleshooting this?
| |
08:46 | Our students have been off since June so it hasn't been a big issue but we're going to want to use it again shortly and its not usable. Hopefully I won't have to reinstall it but I at least want to know why this has happened before I do re-install, if thats whats required.
| |
08:46 | I thought it could be ZFS snapshots but I destroyed them all and its still slug slow
| |
08:47 | There were too many tho. I've lost trust in sanoid for snapshotting after that. I'll still use syncoid tho.
| |
08:49 | Its seems sanoid can't handle purging snapshots across hundreds of datasets. zfs-auto-snapshot can handle that fine
| |
08:50 | alkisg: Any tips to help troubleshoot very slow LTSP clients?
| |
08:50 | <alkisg> danboid: try to login on the server and see if it's slow, it might be unrelated to the network
| |
08:50 | E.g. worn out disks can do that
| |
08:51 | <danboid> alkisg: The server is running fine as far as I can tell. Its got 128 GB RAM etc
| |
08:51 | <alkisg> If the disk is slow, RAM won't help
| |
08:51 | When the clients work and are slow, at that time login on the server and see if it's slow or not
| |
08:52 | <danboid> The clients are always slow now, even if only one client is logged in
| |
08:53 | <alkisg> OK, login on the server and see if it's slow or not
| |
08:53 | If the server appears to be fast after login, then VNC to me so that I take a quick look
| |
08:53 | !vnc-edide
| |
08:53 | <ltspbot> vnc-edide: To share your screen with me, open Epoptes → Help menu → Remote support → Host: srv1-dide.ioa.sch.gr, and click the Connect button
| |
08:54 | <alkisg> On another note, to check networking, there's the epoptes lan benchmark function
| |
08:56 | <danboid> Ah, I didn't know about the lan benchmark. Thats worth giving a go
| |
09:12 | alkisg: Hmm, thats odd. When I open epoptes on my LTSP server, it doesn't list any clients as being connected but I know at least one is booted up.
| |
09:18 | <alkisg> danboid: you probably haven't configured it properly
| |
09:19 | Also, the VM takes a whole lot of time to boot, so something is definitely wrong with the server itself
| |
09:19 | Login there, open a terminal, and sudo -i
| |
09:21 | <danboid> OK
| |
09:23 | alkisg: I've logged in as root to the VM terminal
| |
09:23 | <alkisg> danboid: ok wait, I'm currently multitasking with a lot of tasks...
| |
09:24 | <danboid> alkisg: NP, thanks for looking at this. I've got to go to a meeting now but I'll leave VNC open. Back in 30m
| |
09:27 | <alkisg> @danboid eh what's that, the VM has the same IP as the server?! How are they going to communicate then...
| |
09:32 | buringman42 has joined IRC (buringman42!~ident@20.sub-174-215-145.myvzw.com) | |
09:34 | <alkisg> danboid: additionally, you have snap installed; it might be triggering mass-updates while the clients are booted, making all of them slow
| |
10:02 | buringman42 has left IRC (buringman42!~ident@20.sub-174-215-145.myvzw.com, Quit: Quit) | |
10:09 | <danboid> alkisg: Back. I didn't think the network settings of the VM image mattered? It was running fine for a year with that VM config. I need to give the VM a static IP?
| |
10:10 | alkisg: OK I'll remove snap then but I don't think thats the main cause of my problems here is it
| |
10:12 | alkisg: I thought LTSP assigned each client an IP on boot
| |
10:29 | <alkisg> If a few months pass (not sure how many, 90 days?) then snap may force itself to refresh
| |
10:30 | danboid: so if you run ltsp image now, since snap is now fresh, it won't update itself so it might solve the client issue
| |
10:31 | <danboid> alkisg: I'm asking Jim about the snapshots https://github.com/jimsalterjrs/sanoid/discussions/848
| |
10:32 | I can't help but think thats why its running so slowly
| |
10:32 | <alkisg> Then it would be slow if you logged in on the server too
| |
10:33 | <danboid> Thanks for updating the image. Hopefully this will fix epoptes at least
| |
10:33 | <alkisg> And snap refresh
| |
10:36 | <danboid> alkisg: What were you saying about the IP addresses? I've always presumed ltsp image would scrub/ignore any network settings in the source image, no?
| |
10:36 | <alkisg> danboid: see how slow your disk is. With 48 processors, it should have finished in less than a minute
| |
10:36 | I was saying that you had the same IP in the server as in the VM
| |
10:36 | You shouldn't have the same IP in two different PCs
| |
10:38 | <danboid> alkisg: but the VM isn't running most of the time. That would only be an issue if the VM was running on a bridge right?
| |
10:38 | <alkisg> The VM has a bridge, yes
| |
10:38 | I'm not saying it's what causes the issue. The issue is caused by a worn-out disk that is currently dog-slow
| |
10:38 | The VM IP issue is another matter, which you should solve separatelyl
| |
10:42 | <danboid> alkisg: You say the disks are worn out? You spotted something their? mdstat seems happy enough
| |
10:43 | <alkisg> I spotted ltsp image taking a LOT of time, with a monster CPU
| |
10:43 | Which either means bad/worn out disk, or sure, a zfs issue
| |
10:43 | I didn't have time to look into it more
| |
10:44 | <danboid> Thanks for your help! I'll wait to see what Jim has to say about recovering from too many snapshots.
| |
10:46 | I run another server with a similar number of datasets / users and its not had an such issue with too many snapshots but that uses zfs-auto-snapshot
| |
10:46 | <alkisg> I've seen many disks start with write speed = 500 mbps and end up with 10 mbps after a couple of years
| |
10:47 | After ltsp image finishes, try an hdparm -t /dev/sda to see the speed, if it's under 100 mb/sec that's the issue
| |
10:47 | *mbyte, not bit
| |
10:49 | <danboid> I heard that ZFS performance is supposed to be OK up to about 10K snapshots per dataset. We ended up going well over that.
| |
10:53 | <alkisg> I'm talking about a hardware issue, unrelated to the file system
| |
11:32 | sunweaver is back | |
11:41 | wyre is back | |
12:10 | wyre is now away: Auto away at Wed Sep 6 12:05:17 2023 UTC | |
14:12 | ogra is now away: currently disconnected | |
14:12 | ogra is back | |
15:07 | sunweaver is now away: not here ... | |
15:40 | danboid has left IRC (danboid!~danboid@remote.salford.ac.uk, Quit: Client closed) | |
16:25 | eu^43611162 has joined IRC (eu^43611162!~eu^436111@165.225.63.57) | |
16:35 | vagrantc has joined IRC (vagrantc!~vagrant@2600:3c01:e000:21:7:77:0:50) | |
17:23 | eu^43611162 has left IRC (eu^43611162!~eu^436111@165.225.63.57, Quit: Client closed) | |
18:46 | wyre is back | |
19:14 | woernie has left IRC (woernie!~werner@p5ddedeef.dip0.t-ipconnect.de, Remote host closed the connection) | |
20:47 | ricotz has left IRC (ricotz!~ricotz@ubuntu/member/ricotz, Quit: Leaving) | |
22:17 | wyre is now away: Auto away at Wed Sep 6 22:13:02 2023 UTC | |