Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nnf-ec database not consistent with namespaces on drives #215

Open
matthew-richerson opened this issue Oct 2, 2024 · 3 comments
Open

nnf-ec database not consistent with namespaces on drives #215

matthew-richerson opened this issue Oct 2, 2024 · 3 comments

Comments

@matthew-richerson
Copy link
Contributor

2024-10-02T07:37:00.666-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69363556943922176-0-xfs-0","namespace":"tuolumne265"}, "command": "pvs --reportformat json"}
2024-10-02T07:37:00.693-0700    DEBUG   ec.nvme.0.5     Deleted namespace       {"storageId": "0", "slot": 8, "serialNumber": "4D20A0D30U61", "namespaceId": 5}
2024-10-02T07:37:00.785-0700    DEBUG   ec.nvme.3.5     Deleted namespace       {"storageId": "3", "slot": 16, "serialNumber": "4D20A0CV0U61", "namespaceId": 5}
2024-10-02T07:37:00.877-0700    DEBUG   ec.nvme.15.5    Deleted namespace       {"storageId": "15", "slot": 10, "serialNumber": "4D20A0CY0U61", "namespaceId": 5}
2024-10-02T07:37:00.969-0700    DEBUG   ec.nvme.17.5    Deleted namespace       {"storageId": "17", "slot": 3, "serialNumber": "4D30A0QM0U61", "namespaceId": 5}
2024-10-02T07:37:01.060-0700    DEBUG   ec.nvme.4.5     Deleted namespace       {"storageId": "4", "slot": 17, "serialNumber": "4D20A0CT0U61", "namespaceId": 5}
2024-10-02T07:37:01.152-0700    DEBUG   ec.nvme.8.5     Deleted namespace       {"storageId": "8", "slot": 12, "serialNumber": "4D10A00C0U61", "namespaceId": 5}
2024-10-02T07:37:01.245-0700    DEBUG   ec.nvme.2.5     Deleted namespace       {"storageId": "2", "slot": 15, "serialNumber": "4D20A0DZ0U61", "namespaceId": 5}
2024-10-02T07:37:01.336-0700    DEBUG   ec.nvme.1.5     Deleted namespace       {"storageId": "1", "slot": 7, "serialNumber": "4D20A1210U61", "namespaceId": 5}
2024-10-02T07:37:01.374-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69363221114389504-0-xfs-0","namespace":"tuolumne265"}, "command": "pvs --reportformat json"}
2024-10-02T07:37:01.375-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69363685977490432-0-xfs-0","namespace":"tuolumne265"}, "command": "pvcreate /dev/nvme9n14"}
2024-10-02T07:37:01.381-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69363522332525568-0-xfs-0","namespace":"tuolumne265"}, "command": "pvcreate /dev/nvme6n12"}
2024-10-02T07:37:01.427-0700    DEBUG   ec.nvme.14.5    Deleted namespace       {"storageId": "14", "slot": 9, "serialNumber": "4D20A0D20U61", "namespaceId": 5}
2024-10-02T07:37:01.519-0700    DEBUG   ec.nvme.16.5    Deleted namespace       {"storageId": "16", "slot": 11, "serialNumber": "4D20A0DY0U61", "namespaceId": 5}
2024-10-02T07:37:01.612-0700    DEBUG   ec.nvme.6.5     Deleted namespace       {"storageId": "6", "slot": 14, "serialNumber": "4D20A0D10U61", "namespaceId": 5}
2024-10-02T07:37:01.703-0700    DEBUG   ec.nvme.5.5     Deleted namespace       {"storageId": "5", "slot": 18, "serialNumber": "4D10A03B0U61", "namespaceId": 5}
2024-10-02T07:37:01.726-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69363384558027776-0-xfs-0","namespace":"tuolumne265"}, "command": "wipefs --noheadings --output type /dev/mappe
r/ace09451--18fa--4555--9e7e--f05a0fd0a3ab_0-lv--0"}
2024-10-02T07:37:01.757-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69362213978113024-0-xfs-0","namespace":"tuolumne265"}, "index": 0, "command": "pvs --reportformat json"}
2024-10-02T07:37:01.795-0700    DEBUG   ec.nvme.9.5     Deleted namespace       {"storageId": "9", "slot": 4, "serialNumber": "4D20A0D00U61", "namespaceId": 5}
2024-10-02T07:37:01.797-0700    DEBUG   controllers.NnfNodeStorage      Command Run     {"NnfNodeStorage": {"name":"default-fluxjob-69363384558027776-0-xfs-0","namespace":"tuolumne265"}, "command": "mkfs -t xfs /dev/mapper/ace09451--18fa--4555
--9e7e--f05a0fd0a3ab_0-lv--0"}
2024-10-02T07:37:01.873-0700    INFO    ec.nnf  Deleted storage pool    {"storagePoolId": "default-fluxjob-69361627010434048-0-xfs-0-0"}
2024-10-02T07:37:01.880-0700    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "nnfnodeblockstorage", "controllerGroup": "nnf.cray.hpe.com", "controllerKind": "Nnf
NodeBlockStorage", "NnfNodeBlockStorage": {"name":"default-fluxjob-69362750882579456-0-xfs-0","namespace":"tuolumne265"}, "namespace": "tuolumne265", "name": "default-fluxjob-69362750882579456-0-xfs-0", "reconcileID": "93a8718a-7902-4518-a139-
120fc4d60eb6"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1748ae4]

goroutine 945 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1a0d700?, 0x2e7ac20?})
        /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/NearNodeFlash/nnf-ec/pkg/manager-nvme.(*Volume).GetOdataId(...)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nvme/manager.go:637
github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageService).StorageServiceIdStoragePoolIdCapacitySourceIdProvidingVolumesGet(0x7ffff7fb9108, {0xc0016822b8?, 0xc001e00400?}, {0xc000aa76b0, 0x2b}, {0x1f5adf0, 0x1}, 0xc0004b6aa0)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/manager.go:904 +0x544
github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*AerService).StorageServiceIdStoragePoolIdCapacitySourceIdProvidingVolumesGet(0xc00005ff30, {0xc0016822b8?, 0xc00005ff30?}, {0xc000aa76b0?, 0x2b?}, {0x1f5adf0?, 0xc001026b90?}, 0xc001026bf8?)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/aer.go:110 +0x38
github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).allocateStorage(0xc0005da600, 0xc000598780, 0x0)
        /workspace/internal/controller/nnf_node_block_storage_controller.go:305 +0x410
github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).Reconcile(0xc0005da600, {0x1f7f8b8?, 0xc001b05b00}, {{{0xc002314830, 0xb}, {0xc000b18f60, 0x29}}})
        /workspace/internal/controller/nnf_node_block_storage_controller.go:240 +0xd7e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1f82cd0?, {0x1f7f8b8?, 0xc001b05b00?}, {{{0xc002314830?, 0xb?}, {0xc000b18f60?, 0x0?}}})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005d1400, {0x1f7f8f0, 0xc0005dccd0}, {0x1aa95a0?, 0xc000c16c80?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005d1400, {0x1f7f8f0, 0xc0005dccd0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 174
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565
@matthew-richerson
Copy link
Contributor Author

2024-10-02T07:36:57.662-0700	DEBUG	controllers.NnfNodeBlockStorage	Command Run	{"NnfNodeBlockStorage": {"name":"default-fluxjob-69363685977490432-0-xfs-0","namespace":"tuolumne265"}, "command": "nvme list -v --output-format=json"}
2024-10-02T07:36:58.009-0700	INFO	controllers.NnfNodeBlockStorage	Deleting storage pool	{"NnfNodeBlockStorage": {"name":"default-fluxjob-69361627010434048-0-xfs-0","namespace":"tuolumne265"}, "Id": "default-fluxjob-69361627010434048-0-xfs-0-0"}
2024-10-02T07:36:58.044-0700	DEBUG	ec.nvme.12.5	Attached namespace	{"storageId": "12", "slot": 2, "serialNumber": "4D20A0CW0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.154-0700	DEBUG	ec.nvme.12.5	Formatted namespace	{"storageId": "12", "slot": 2, "serialNumber": "4D20A0CW0U61", "namespaceId": 5}
2024-10-02T07:36:58.173-0700	DEBUG	ec.nvme.12.5	Detached namespace	{"storageId": "12", "slot": 2, "serialNumber": "4D20A0CW0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.177-0700	DEBUG	ec.nvme.11.5	Attached namespace	{"storageId": "11", "slot": 6, "serialNumber": "4D20A0E30U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.287-0700	DEBUG	ec.nvme.11.5	Formatted namespace	{"storageId": "11", "slot": 6, "serialNumber": "4D20A0E30U61", "namespaceId": 5}
2024-10-02T07:36:58.307-0700	DEBUG	ec.nvme.11.5	Detached namespace	{"storageId": "11", "slot": 6, "serialNumber": "4D20A0E30U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.311-0700	DEBUG	ec.nvme.10.5	Attached namespace	{"storageId": "10", "slot": 5, "serialNumber": "4D20A0CS0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.420-0700	DEBUG	ec.nvme.10.5	Formatted namespace	{"storageId": "10", "slot": 5, "serialNumber": "4D20A0CS0U61", "namespaceId": 5}
2024-10-02T07:36:58.438-0700	DEBUG	ec.nvme.10.5	Detached namespace	{"storageId": "10", "slot": 5, "serialNumber": "4D20A0CS0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.441-0700	DEBUG	ec.nvme.0.5	Attached namespace	{"storageId": "0", "slot": 8, "serialNumber": "4D20A0D30U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.551-0700	DEBUG	ec.nvme.0.5	Formatted namespace	{"storageId": "0", "slot": 8, "serialNumber": "4D20A0D30U61", "namespaceId": 5}
2024-10-02T07:36:58.570-0700	DEBUG	ec.nvme.0.5	Detached namespace	{"storageId": "0", "slot": 8, "serialNumber": "4D20A0D30U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.574-0700	DEBUG	ec.nvme.3.5	Attached namespace	{"storageId": "3", "slot": 16, "serialNumber": "4D20A0CV0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.686-0700	DEBUG	ec.nvme.3.5	Formatted namespace	{"storageId": "3", "slot": 16, "serialNumber": "4D20A0CV0U61", "namespaceId": 5}
2024-10-02T07:36:58.703-0700	DEBUG	ec.nvme.3.5	Detached namespace	{"storageId": "3", "slot": 16, "serialNumber": "4D20A0CV0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.707-0700	DEBUG	ec.nvme.15.5	Attached namespace	{"storageId": "15", "slot": 10, "serialNumber": "4D20A0CY0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.828-0700	DEBUG	ec.nvme.15.5	Formatted namespace	{"storageId": "15", "slot": 10, "serialNumber": "4D20A0CY0U61", "namespaceId": 5}
2024-10-02T07:36:58.846-0700	DEBUG	ec.nvme.15.5	Detached namespace	{"storageId": "15", "slot": 10, "serialNumber": "4D20A0CY0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.850-0700	DEBUG	ec.nvme.17.5	Attached namespace	{"storageId": "17", "slot": 3, "serialNumber": "4D30A0QM0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.961-0700	DEBUG	ec.nvme.17.5	Formatted namespace	{"storageId": "17", "slot": 3, "serialNumber": "4D30A0QM0U61", "namespaceId": 5}
2024-10-02T07:36:58.979-0700	DEBUG	ec.nvme.17.5	Detached namespace	{"storageId": "17", "slot": 3, "serialNumber": "4D30A0QM0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:58.983-0700	DEBUG	ec.nvme.4.5	Attached namespace	{"storageId": "4", "slot": 17, "serialNumber": "4D20A0CT0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.092-0700	DEBUG	ec.nvme.4.5	Formatted namespace	{"storageId": "4", "slot": 17, "serialNumber": "4D20A0CT0U61", "namespaceId": 5}
2024-10-02T07:36:59.111-0700	DEBUG	ec.nvme.4.5	Detached namespace	{"storageId": "4", "slot": 17, "serialNumber": "4D20A0CT0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.115-0700	DEBUG	ec.nvme.8.5	Attached namespace	{"storageId": "8", "slot": 12, "serialNumber": "4D10A00C0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.226-0700	DEBUG	ec.nvme.8.5	Formatted namespace	{"storageId": "8", "slot": 12, "serialNumber": "4D10A00C0U61", "namespaceId": 5}
2024-10-02T07:36:59.247-0700	DEBUG	ec.nvme.8.5	Detached namespace	{"storageId": "8", "slot": 12, "serialNumber": "4D10A00C0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.251-0700	DEBUG	ec.nvme.2.5	Attached namespace	{"storageId": "2", "slot": 15, "serialNumber": "4D20A0DZ0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.364-0700	DEBUG	ec.nvme.2.5	Formatted namespace	{"storageId": "2", "slot": 15, "serialNumber": "4D20A0DZ0U61", "namespaceId": 5}
2024-10-02T07:36:59.389-0700	DEBUG	ec.nvme.2.5	Detached namespace	{"storageId": "2", "slot": 15, "serialNumber": "4D20A0DZ0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.393-0700	DEBUG	ec.nvme.1.5	Attached namespace	{"storageId": "1", "slot": 7, "serialNumber": "4D20A1210U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.503-0700	DEBUG	ec.nvme.1.5	Formatted namespace	{"storageId": "1", "slot": 7, "serialNumber": "4D20A1210U61", "namespaceId": 5}
2024-10-02T07:36:59.525-0700	DEBUG	ec.nvme.1.5	Detached namespace	{"storageId": "1", "slot": 7, "serialNumber": "4D20A1210U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.529-0700	DEBUG	ec.nvme.14.5	Attached namespace	{"storageId": "14", "slot": 9, "serialNumber": "4D20A0D20U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.637-0700	DEBUG	ec.nvme.14.5	Formatted namespace	{"storageId": "14", "slot": 9, "serialNumber": "4D20A0D20U61", "namespaceId": 5}
2024-10-02T07:36:59.657-0700	DEBUG	ec.nvme.14.5	Detached namespace	{"storageId": "14", "slot": 9, "serialNumber": "4D20A0D20U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.661-0700	DEBUG	ec.nvme.16.5	Attached namespace	{"storageId": "16", "slot": 11, "serialNumber": "4D20A0DY0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.774-0700	DEBUG	ec.nvme.16.5	Formatted namespace	{"storageId": "16", "slot": 11, "serialNumber": "4D20A0DY0U61", "namespaceId": 5}
2024-10-02T07:36:59.794-0700	DEBUG	ec.nvme.16.5	Detached namespace	{"storageId": "16", "slot": 11, "serialNumber": "4D20A0DY0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.799-0700	DEBUG	ec.nvme.6.5	Attached namespace	{"storageId": "6", "slot": 14, "serialNumber": "4D20A0D10U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.910-0700	DEBUG	ec.nvme.6.5	Formatted namespace	{"storageId": "6", "slot": 14, "serialNumber": "4D20A0D10U61", "namespaceId": 5}
2024-10-02T07:36:59.929-0700	DEBUG	ec.nvme.6.5	Detached namespace	{"storageId": "6", "slot": 14, "serialNumber": "4D20A0D10U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:36:59.933-0700	DEBUG	ec.nvme.5.5	Attached namespace	{"storageId": "5", "slot": 18, "serialNumber": "4D10A03B0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:37:00.043-0700	DEBUG	ec.nvme.5.5	Formatted namespace	{"storageId": "5", "slot": 18, "serialNumber": "4D10A03B0U61", "namespaceId": 5}
2024-10-02T07:37:00.063-0700	DEBUG	ec.nvme.5.5	Detached namespace	{"storageId": "5", "slot": 18, "serialNumber": "4D10A03B0U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:37:00.067-0700	DEBUG	ec.nvme.9.5	Attached namespace	{"storageId": "9", "slot": 4, "serialNumber": "4D20A0D00U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:37:00.177-0700	DEBUG	ec.nvme.9.5	Formatted namespace	{"storageId": "9", "slot": 4, "serialNumber": "4D20A0D00U61", "namespaceId": 5}
2024-10-02T07:37:00.197-0700	DEBUG	ec.nvme.9.5	Detached namespace	{"storageId": "9", "slot": 4, "serialNumber": "4D20A0D00U61", "namespaceId": 5, "controllerId": 1}
2024-10-02T07:37:00.201-0700	DEBUG	ec.nvme.12.5	Format completed	{"storageId": "12", "slot": 2, "serialNumber": "4D20A0CW0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.205-0700	DEBUG	ec.nvme.11.5	Format completed	{"storageId": "11", "slot": 6, "serialNumber": "4D20A0E30U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.209-0700	DEBUG	ec.nvme.10.5	Format completed	{"storageId": "10", "slot": 5, "serialNumber": "4D20A0CS0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.213-0700	DEBUG	ec.nvme.0.5	Format completed	{"storageId": "0", "slot": 8, "serialNumber": "4D20A0D30U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.218-0700	DEBUG	ec.nvme.3.5	Format completed	{"storageId": "3", "slot": 16, "serialNumber": "4D20A0CV0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.222-0700	DEBUG	ec.nvme.15.5	Format completed	{"storageId": "15", "slot": 10, "serialNumber": "4D20A0CY0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.227-0700	DEBUG	ec.nvme.17.5	Format completed	{"storageId": "17", "slot": 3, "serialNumber": "4D30A0QM0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.231-0700	DEBUG	ec.nvme.4.5	Format completed	{"storageId": "4", "slot": 17, "serialNumber": "4D20A0CT0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.235-0700	DEBUG	ec.nvme.8.5	Format completed	{"storageId": "8", "slot": 12, "serialNumber": "4D10A00C0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.240-0700	DEBUG	ec.nvme.2.5	Format completed	{"storageId": "2", "slot": 15, "serialNumber": "4D20A0DZ0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.244-0700	DEBUG	ec.nvme.1.5	Format completed	{"storageId": "1", "slot": 7, "serialNumber": "4D20A1210U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.248-0700	DEBUG	ec.nvme.14.5	Format completed	{"storageId": "14", "slot": 9, "serialNumber": "4D20A0D20U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.253-0700	DEBUG	ec.nvme.16.5	Format completed	{"storageId": "16", "slot": 11, "serialNumber": "4D20A0DY0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.257-0700	DEBUG	ec.nvme.6.5	Format completed	{"storageId": "6", "slot": 14, "serialNumber": "4D20A0D10U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.261-0700	DEBUG	ec.nvme.5.5	Format completed	{"storageId": "5", "slot": 18, "serialNumber": "4D10A03B0U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.266-0700	DEBUG	ec.nvme.9.5	Format completed	{"storageId": "9", "slot": 4, "serialNumber": "4D20A0D00U61", "namespaceId": 5, "utilization": 0}
2024-10-02T07:37:00.358-0700	DEBUG	ec.nvme.12.5	Deleted namespace	{"storageId": "12", "slot": 2, "serialNumber": "4D20A0CW0U61", "namespaceId": 5}
2024-10-02T07:37:00.449-0700	DEBUG	ec.nvme.11.5	Deleted namespace	{"storageId": "11", "slot": 6, "serialNumber": "4D20A0E30U61", "namespaceId": 5}
2024-10-02T07:37:00.602-0700	DEBUG	ec.nvme.10.5	Deleted namespace	{"storageId": "10", "slot": 5, "serialNumber": "4D20A0CS0U61", "namespaceId": 5}
2024-10-02T07:37:00.693-0700	DEBUG	ec.nvme.0.5	Deleted namespace	{"storageId": "0", "slot": 8, "serialNumber": "4D20A0D30U61", "namespaceId": 5}
2024-10-02T07:37:00.785-0700	DEBUG	ec.nvme.3.5	Deleted namespace	{"storageId": "3", "slot": 16, "serialNumber": "4D20A0CV0U61", "namespaceId": 5}
2024-10-02T07:37:00.877-0700	DEBUG	ec.nvme.15.5	Deleted namespace	{"storageId": "15", "slot": 10, "serialNumber": "4D20A0CY0U61", "namespaceId": 5}
2024-10-02T07:37:00.969-0700	DEBUG	ec.nvme.17.5	Deleted namespace	{"storageId": "17", "slot": 3, "serialNumber": "4D30A0QM0U61", "namespaceId": 5}
2024-10-02T07:37:01.060-0700	DEBUG	ec.nvme.4.5	Deleted namespace	{"storageId": "4", "slot": 17, "serialNumber": "4D20A0CT0U61", "namespaceId": 5}
2024-10-02T07:37:01.152-0700	DEBUG	ec.nvme.8.5	Deleted namespace	{"storageId": "8", "slot": 12, "serialNumber": "4D10A00C0U61", "namespaceId": 5}
2024-10-02T07:37:01.245-0700	DEBUG	ec.nvme.2.5	Deleted namespace	{"storageId": "2", "slot": 15, "serialNumber": "4D20A0DZ0U61", "namespaceId": 5}
2024-10-02T07:37:01.336-0700	DEBUG	ec.nvme.1.5	Deleted namespace	{"storageId": "1", "slot": 7, "serialNumber": "4D20A1210U61", "namespaceId": 5}
2024-10-02T07:37:01.427-0700	DEBUG	ec.nvme.14.5	Deleted namespace	{"storageId": "14", "slot": 9, "serialNumber": "4D20A0D20U61", "namespaceId": 5}
2024-10-02T07:37:01.519-0700	DEBUG	ec.nvme.16.5	Deleted namespace	{"storageId": "16", "slot": 11, "serialNumber": "4D20A0DY0U61", "namespaceId": 5}
2024-10-02T07:37:01.612-0700	DEBUG	ec.nvme.6.5	Deleted namespace	{"storageId": "6", "slot": 14, "serialNumber": "4D20A0D10U61", "namespaceId": 5}
2024-10-02T07:37:01.703-0700	DEBUG	ec.nvme.5.5	Deleted namespace	{"storageId": "5", "slot": 18, "serialNumber": "4D10A03B0U61", "namespaceId": 5}
2024-10-02T07:37:01.795-0700	DEBUG	ec.nvme.9.5	Deleted namespace	{"storageId": "9", "slot": 4, "serialNumber": "4D20A0D00U61", "namespaceId": 5}
2024-10-02T07:37:01.873-0700	INFO	ec.nnf	Deleted storage pool	{"storagePoolId": "default-fluxjob-69361627010434048-0-xfs-0-0"}
2024-10-02T07:37:01.880-0700	INFO	Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference	{"controller": "nnfnodeblockstorage", "controllerGroup": "nnf.cray.hpe.com", "controllerKind": "NnfNodeBlockStorage", "NnfNodeBlockStorage": {"name":"default-fluxjob-69362750882579456-0-xfs-0","namespace":"tuolumne265"}, "namespace": "tuolumne265", "name": "default-fluxjob-69362750882579456-0-xfs-0", "reconcileID": "93a8718a-7902-4518-a139-120fc4d60eb6"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]

@matthew-richerson
Copy link
Contributor Author

On restart, nnf-ec fails to start up:

2024-10-02T07:37:04.913-0700    DEBUG   ec.nnf  Fabric ready    {"eventId": "54", "eventMessage": "The fabric '%1' is ready", "eventArgs": ["Rabbit"]}
2024-10-02T07:37:04.913-0700    INFO    ec.nnf  recover volumes
2024-10-02T07:37:04.913-0700    INFO    ec.nnf  recover volumes
2024-10-02T07:37:04.913-0700    INFO    ec.nnf  recover volumes
2024-10-02T07:37:04.913-0700    ERROR   ec.nnf  namespace not found     {"serialNumber": "4D20A0CW0U61", "namespaceId": 5, "error": "Error 404: Not Found, Retry-Delay: 0s"}
2024-10-02T07:37:04.913-0700    ERROR   ec.nnf  Failed to replay storage database       {"eventId": "54", "eventMessage": "The fabric '%1' is ready", "eventArgs": ["Rabbit"], "error": "Error 404: Not Found, Retry-Delay: 0s"}
2024-10-02T07:57:04.947-0700    INFO    ec.nnf  Link dropped    {"eventId": "57", "eventMessage": "Switch '%1' upstream link has gone down on port '%2'.", "eventArgs": ["0", "9"]}
2024-10-02T07:57:04.947-0700    INFO    ec.nnf  Link dropped    {"eventId": "60", "eventMessage": "Switch '%1' upstream link has gone down on port '%2'.", "eventArgs": ["1", "2"]}

@bdevcich bdevcich changed the title nnf-ec segfaul nnf-ec segfault Oct 2, 2024
@matthew-richerson matthew-richerson changed the title nnf-ec segfault nnf-ec database not consistent with namespaces on drives Oct 14, 2024
@matthew-richerson
Copy link
Contributor Author

We saw another nnf-ec segfault today on 3 nodes trying to remove storage pools:

2024-11-08T13:46:28.950-0800    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "nnfnodeblockstorage", "controllerGroup": "nnf.cray.hpe.com", "controllerKind": "NnfNodeBlockStorage", "NnfNodeBlockStorage": {"name":"systemstorage-nolvmlockd-odd-system-storage-0","namespace":"elcap706"}, "namespace": "elcap706", "name": "systemstorage-nolvmlockd-odd-system-storage-0", "reconcileID": "d1ba9c54-fce3-4dda-bc4e-c3a142249b14"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x176070e]

goroutine 729 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1a5b560?, 0x2f1ecd0?})
        /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/NearNodeFlash/nnf-ec/pkg/manager-server.(*Storage).GetStatus(...)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-server/storage.go:68
github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageGroup).status(0xc0002233b0?)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/storage_group.go:68 +0x2e
github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageService).StorageServiceIdStorageGroupIdGet(0xc001091800, {0xc0026f6368?, 0x0?}, {0xc0070a4400, 0x32}, 0xc0071d9200)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/manager.go:1103 +0x585
github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*AerService).StorageServiceIdStorageGroupIdGet(0xc007023090, {0xc0026f6368?, 0x3?}, {0xc0070a4400?, 0x32?}, 0xc0070a4400?)
        /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/aer.go:129 +0x30
github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).getStorageGroup(0x1cec296?, {0x1ffdf08, 0xc007023090}, {0xc0070a4400, 0x32})
        /workspace/internal/controller/nnf_node_block_storage_controller.go:623 +0x79
github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).createBlockDevice(0xc000693400, {0x1fe03b8, 0xc00653a1e0}, 0xc006a74600, 0x0)
        /workspace/internal/controller/nnf_node_block_storage_controller.go:408 +0x94d
github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).Reconcile(0xc000693400, {0x1fe03b8?, 0xc00653a1e0}, {{{0xc000cba1a8, 0x8}, {0xc00067ac30, 0x2d}}})
        /workspace/internal/controller/nnf_node_block_storage_controller.go:249 +0xdbc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1fe36f0?, {0x1fe03b8?, 0xc00653a1e0?}, {{{0xc000cba1a8?, 0xb?}, {0xc00067ac30?, 0x0?}}})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00069eb40, {0x1fe03f0, 0xc0006a13b0}, {0x1afa020?, 0xc0001befe0?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00069eb40, {0x1fe03f0, 0xc0006a13b0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 228
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Open
Development

No branches or pull requests

1 participant