Post ANrlj6RxQQ2XKJaOJs by codeberg@mastodon.technology
(DIR) More posts by codeberg@mastodon.technology
(DIR) Post #ANrlj5LXWnJPu7Zkg4 by codeberg@mastodon.technology
2022-09-21T15:02:12Z
0 likes, 0 repeats
Short downtime - again due to frozen Ceph mount (umount -f doesn't work), after several waves of many requests today.Looking for expertise 😅
(DIR) Post #ANrlj5ytAUmts9zB0i by inference@plr.inferencium.net
2022-09-23T19:55:29.547971Z
0 likes, 0 repeats
@codeberg `unmount -l`?
(DIR) Post #ANrlj6RxQQ2XKJaOJs by codeberg@mastodon.technology
2022-09-21T15:07:03Z
0 likes, 1 repeats
To elaborate: In order to run services via multiple instances, we're slowly migrating data to the Ceph filesystem.However, while we were testing this for months, the practice sometimes got us hard. We apologize for the inconvenience and we'll do our best to restore the stability of the past years.If you want to get involved with the infrastructure behind Codeberg and / or want to share your experiences with Ceph, please reach out 😉
(DIR) Post #ANroxUBCIMkyq7IR4y by marian@gruene.social
2022-09-22T06:21:07Z
0 likes, 1 repeats
@codeberg Sorry, can't help.Looks like this is related. Happens when adding an attachment to an issue comment.
(DIR) Post #ANuERtzNmrWikBI3yC by codeberg@mastodon.technology
2022-09-25T00:21:09Z
0 likes, 4 repeats
Our journey continues with this issue: https://codeberg.org/Codeberg-Infrastructure/configuration-as-code/issues/35Feel free to read and discuss on, either here on Matrix or in the issue itself.Thanks to the team, it's been a long meeting tonight :)
(DIR) Post #ANueOTG1JeFZmD64XI by adam@hax0rbana.social
2022-09-25T05:17:23Z
0 likes, 0 repeats
@codeberg When I was using CephFS, I always mounted it locally on one of the ceph nodes and then used sshfs to mount it remotely. That was very stable.The kernel stacktrace does have ceph in it, so switching to SSH for the remote mount might allow you to confirm the theory that it's the ceph mount that is the root cause of the kernel panic.
(DIR) Post #ANvRFG4tqIuq378FnM by codeberg@mastodon.technology
2022-09-25T14:24:51Z
0 likes, 0 repeats
@adam From our observation, the performance of sshfs wasn't fast enough for many small read accesses 😞
(DIR) Post #ANvlcUGbFmeFnYxUZc by adam@hax0rbana.social
2022-09-25T18:13:06Z
0 likes, 0 repeats
@codeberg Sadness. Since you already have a VPN, NFS might be an option, especially if this is just a temporary solution.I'm just trying to figure out some way to confirm that ceph is causing the kernel panic.I suppose the other route would be to swap out the kernel on the box that is crashing. Though swap out for what is the real question... newer? different config? the same kernl as the ceph node is running?
(DIR) Post #ANw2slJpJwOiXRoXlQ by codeberg@mastodon.technology
2022-09-25T21:26:20Z
0 likes, 0 repeats
@adam We could first try to upgrade the Ceph tools to a newer versions as @dachary suggested. The infra team will decide.And yes, using something different than Ceph might also be worth another look. I tested NFS locally with Git garbage collection, it was a nightmare actually. But maybe the performance on the Codeberg servers would have been much better.
(DIR) Post #ANw3IO8Q8mQiADhN5s by adam@hax0rbana.social
2022-09-25T21:31:14Z
0 likes, 0 repeats
@codeberg @dacharyI have limited experience with NFS, but in that experience ASYNC_WRITE is key for performance (like an order of magnitude faster). Of course, that comes at the price of being theoretically less robust during a service interruption.Best of luck. 🙂