GitLab: gitlab-shell timeouts, and /metrics Connection refused

After running our self-hosted GitLab in production, we faced a bug: during git
clone/pull/push operations, the request sometimes hung for 1-2 minutes.
It looked like some kind of “floating” bug, that is, it could normally work 5 times, and then hangs once.
The issues
gitlab-shell
timeouts
For example, one time git clone
works well:
time git clone git@gitlab-shell.internal.example.com:example/platform/tables-api.git
Cloning into 'tables-api'...
...
real 0m1.380s
And then a clone of the same repository takes 2 minutes:
time git clone git@gitlab-shell.internal.example.com:example/platform/tables-api.git
Cloning into 'tables-api'...
...
real 2m10.497s
And it doesn’t look like a network issue, but rather something with SSH at the session establishment and key exchange stage.
Fortunately, I didn’t dig too deep, because first I decided to fix the problem with the metrics so that I could see what was happening with GitLab Shell in monitoring.
gitlab-shell
/metrics endpoint Connection refused
I already talked about the issue with metrics when I described the monitoring settings in the GitLab: monitoring – Prometheus, metrics, and Grafana dashboard post, and there was an issue with Git/SSH metrics from the pod gitlab-shell
.
It looked like this: open port 9122 (see values):
kk -n gitlab-cluster-prod port-forward gitlab-cluster-prod-gitlab-shell-744675c985-5t8wn 9122
Try it with curl
:
curl localhost:9122/metrics
curl: (52) Empty reply from server
And Pod says the “Connection refused“:
...
Handling connection for 9122
E0315 12:40:43.712508 826225 portforward.go:407] an error occurred forwarding 9122 -> 9122: error forwarding port 9122 to pod 51856f9224907d4c1380783e46b13069ef5322ae1f286d4301f90a2ed60483c0, uid : exit status 1: 2023/03/15 10:40:43 socat[28712] E connect(5, AF=2 127.0.0.1:9122, 16): Connection refused
E0315 12:40:43.713039 826225 portforward.go:233] lost connection to pod
The solution
As it turned out, GitLab Shell supports two SSH daemons – openssh
and gitlab-sshd
, and openssh
is the default value, see values :
...## Allow to select ssh daemon that would be executed inside container
## Possible values: openssh, gitlab-sshd
sshDaemon: openssh
...
So, update our values:
...
gitlab-shell:
enabled: true
metrics:
enabled: true
sshDaemon: gitlab-sshd
...
Deploy and check the metrics:
curl localhost:9122/metrics
HELP gitlab_build_info Current build info for this GitLab Service
TYPE gitlab_build_info gauge
gitlab_build_info{built="20230309.174623",version="v14.17.0"} 1
HELP gitlab_shell_gitaly_connections_total Number of Gitaly connections that have been established
TYPE gitlab_shell_gitaly_connections_total counter
gitlab_shell_gitaly_connections_total{status="ok"} 2
...
The issue with timeouts has also been solved – now the result is not longer than 1 second – real 0m0.846s
.
Similar posts
03/12/2023 GitLab: monitoring – Prometheus, metrics, and Grafana dashboard (0)
05/01/2023 Kubernetes: vertical Pods scaling with Vertical Pod Autoscaler (0)
04/28/2023 Prometheus: running Pushgateway on Kubernetes with Helm and Terraform (0)
03/12/2023 Grafana Loki: LogQL for logs and creating metrics for alerts (0)
03/12/2023 Grafana Loki: alerts from the Loki Ruler and labels from logs (0)
The post GitLab: gitlab-shell timeouts, and /metrics Connection refused first appeared on RTFM: Linux, DevOps, and system administration.