IRC chat logs for #ltsp on irc.libera.chat (webchat)

Request log from specific day:

Channel log from 6 September 2023 (all times are UTC)

00:29	sunweaver has left IRC (sunweaver!~sunweaver@fylgja.das-netzwerkteam.de, .net .split)
00:29	wyre has left IRC (wyre!~wyre@user/wyre, .net .split)
00:29	sunweaver has joined IRC (sunweaver!~sunweaver@fylgja.das-netzwerkteam.de)
00:29	sunweaver is now away: not here ...
00:30	wyre has joined IRC (wyre!~wyre@user/wyre)
00:30	wyre is now away: Auto away at Wed Sep 6 00:25:20 2023 UTC
05:28	woernie has joined IRC (woernie!~werner@p5ddedeef.dip0.t-ipconnect.de)
05:34	wyre is back
06:33	ricotz has joined IRC (ricotz!~ricotz@ubuntu/member/ricotz)
06:40	wyre is now away: Auto away at Wed Sep 6 06:35:20 2023 UTC
08:41	danboid has joined IRC (danboid!~danboid@remote.salford.ac.uk)
08:43	<danboid> A few months ago, LTSP slowed to a crawl but I don't know why. It takes about 3 minutes to load Chrome under LTSP on an i7 with 32 GB RAM. Has anyone got any suggestions for troubleshooting this?
08:46	Our students have been off since June so it hasn't been a big issue but we're going to want to use it again shortly and its not usable. Hopefully I won't have to reinstall it but I at least want to know why this has happened before I do re-install, if thats whats required.
08:46	I thought it could be ZFS snapshots but I destroyed them all and its still slug slow
08:47	There were too many tho. I've lost trust in sanoid for snapshotting after that. I'll still use syncoid tho.
08:49	Its seems sanoid can't handle purging snapshots across hundreds of datasets. zfs-auto-snapshot can handle that fine
08:50	alkisg: Any tips to help troubleshoot very slow LTSP clients?
08:50	<alkisg> danboid: try to login on the server and see if it's slow, it might be unrelated to the network
08:50	E.g. worn out disks can do that
08:51	<danboid> alkisg: The server is running fine as far as I can tell. Its got 128 GB RAM etc
08:51	<alkisg> If the disk is slow, RAM won't help
08:51	When the clients work and are slow, at that time login on the server and see if it's slow or not
08:52	<danboid> The clients are always slow now, even if only one client is logged in
08:53	<alkisg> OK, login on the server and see if it's slow or not
08:53	If the server appears to be fast after login, then VNC to me so that I take a quick look
08:53	!vnc-edide
08:53	<ltspbot> vnc-edide: To share your screen with me, open Epoptes → Help menu → Remote support → Host: srv1-dide.ioa.sch.gr, and click the Connect button
08:54	<alkisg> On another note, to check networking, there's the epoptes lan benchmark function
08:56	<danboid> Ah, I didn't know about the lan benchmark. Thats worth giving a go
09:12	alkisg: Hmm, thats odd. When I open epoptes on my LTSP server, it doesn't list any clients as being connected but I know at least one is booted up.
09:18	<alkisg> danboid: you probably haven't configured it properly
09:19	Also, the VM takes a whole lot of time to boot, so something is definitely wrong with the server itself
09:19	Login there, open a terminal, and sudo -i
09:21	<danboid> OK
09:23	alkisg: I've logged in as root to the VM terminal
09:23	<alkisg> danboid: ok wait, I'm currently multitasking with a lot of tasks...
09:24	<danboid> alkisg: NP, thanks for looking at this. I've got to go to a meeting now but I'll leave VNC open. Back in 30m
09:27	<alkisg> @danboid eh what's that, the VM has the same IP as the server?! How are they going to communicate then...
09:32	buringman42 has joined IRC (buringman42!~ident@20.sub-174-215-145.myvzw.com)
09:34	<alkisg> danboid: additionally, you have snap installed; it might be triggering mass-updates while the clients are booted, making all of them slow
10:02	buringman42 has left IRC (buringman42!~ident@20.sub-174-215-145.myvzw.com, Quit: Quit)
10:09	<danboid> alkisg: Back. I didn't think the network settings of the VM image mattered? It was running fine for a year with that VM config. I need to give the VM a static IP?
10:10	alkisg: OK I'll remove snap then but I don't think thats the main cause of my problems here is it
10:12	alkisg: I thought LTSP assigned each client an IP on boot
10:29	<alkisg> If a few months pass (not sure how many, 90 days?) then snap may force itself to refresh
10:30	danboid: so if you run ltsp image now, since snap is now fresh, it won't update itself so it might solve the client issue
10:31	<danboid> alkisg: I'm asking Jim about the snapshots https://github.com/jimsalterjrs/sanoid/discussions/848
10:32	I can't help but think thats why its running so slowly
10:32	<alkisg> Then it would be slow if you logged in on the server too
10:33	<danboid> Thanks for updating the image. Hopefully this will fix epoptes at least
10:33	<alkisg> And snap refresh
10:36	<danboid> alkisg: What were you saying about the IP addresses? I've always presumed ltsp image would scrub/ignore any network settings in the source image, no?
10:36	<alkisg> danboid: see how slow your disk is. With 48 processors, it should have finished in less than a minute
10:36	I was saying that you had the same IP in the server as in the VM
10:36	You shouldn't have the same IP in two different PCs
10:38	<danboid> alkisg: but the VM isn't running most of the time. That would only be an issue if the VM was running on a bridge right?
10:38	<alkisg> The VM has a bridge, yes
10:38	I'm not saying it's what causes the issue. The issue is caused by a worn-out disk that is currently dog-slow
10:38	The VM IP issue is another matter, which you should solve separatelyl
10:42	<danboid> alkisg: You say the disks are worn out? You spotted something their? mdstat seems happy enough
10:43	<alkisg> I spotted ltsp image taking a LOT of time, with a monster CPU
10:43	Which either means bad/worn out disk, or sure, a zfs issue
10:43	I didn't have time to look into it more
10:44	<danboid> Thanks for your help! I'll wait to see what Jim has to say about recovering from too many snapshots.
10:46	I run another server with a similar number of datasets / users and its not had an such issue with too many snapshots but that uses zfs-auto-snapshot
10:46	<alkisg> I've seen many disks start with write speed = 500 mbps and end up with 10 mbps after a couple of years
10:47	After ltsp image finishes, try an hdparm -t /dev/sda to see the speed, if it's under 100 mb/sec that's the issue
10:47	*mbyte, not bit
10:49	<danboid> I heard that ZFS performance is supposed to be OK up to about 10K snapshots per dataset. We ended up going well over that.
10:53	<alkisg> I'm talking about a hardware issue, unrelated to the file system
11:32	sunweaver is back
11:41	wyre is back
12:10	wyre is now away: Auto away at Wed Sep 6 12:05:17 2023 UTC
14:12	ogra is now away: currently disconnected
14:12	ogra is back
15:07	sunweaver is now away: not here ...
15:40	danboid has left IRC (danboid!~danboid@remote.salford.ac.uk, Quit: Client closed)
16:25	eu^43611162 has joined IRC (eu^43611162!~eu^436111@165.225.63.57)
16:35	vagrantc has joined IRC (vagrantc!~vagrant@2600:3c01:e000:21:7:77:0:50)
17:23	eu^43611162 has left IRC (eu^43611162!~eu^436111@165.225.63.57, Quit: Client closed)
18:46	wyre is back
19:14	woernie has left IRC (woernie!~werner@p5ddedeef.dip0.t-ipconnect.de, Remote host closed the connection)
20:47	ricotz has left IRC (ricotz!~ricotz@ubuntu/member/ricotz, Quit: Leaving)
22:17	wyre is now away: Auto away at Wed Sep 6 22:13:02 2023 UTC