Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU unable to handle NVME with 4k block size #1375

Open
mgherghi opened this issue Nov 14, 2024 · 4 comments · Fixed by #1444
Open

QEMU unable to handle NVME with 4k block size #1375

mgherghi opened this issue Nov 14, 2024 · 4 comments · Fixed by #1444
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@mgherghi
Copy link

Issue description

Cant seem to boot any VM when NVME drives have a lbaf of 4096 bytes and im using the lvmcluster driver.
I have changed the lbaf of my drives to 512 bytes in order to test a similar issue that another use had on the forum (link), and suddenly VM's work.

The issue seems to be that QEMU may not know how to handle NVME drives with block size 4096 bytes, and when booting the VM it is unable to properly handle the image and can't find the Qemu hard drive it needs in order to boot. I have tried with secure boot turned off as well, still no go.

The problem seems to only affect VM's and not containers. Containers do not have an issue with either block size.

Steps to reproduce

  1. Create NVME over RDMA storage protocol using nvmetcli
  2. Install Qemu-system
  3. Setup sanlock and lvm2 and lvm2-lockd configurations (lvm.conf and lvmlocal.conf in /etc/lvm/)
  4. Create Shared Volume Group using vgcreate vg_name /dev/nvmeXnX --shared
  5. Setup incus clustering with lvmcluster driver as remote storage
  6. Create a VM
  7. Try to use console to debug why vm doesn't finish booting

I have tried looking at the qemu.log from the incus UI but it is empty as are the other logs under /var

System OS : Debian 12.8
Kernel : 6.1.27 amd64
Incus version: 6.0.2 LTS

Information to attach

Screenshot 2024-11-11 at 10 36 49 PM Screenshot 2024-11-12 at 5 39 08 PM

(I have launch with --debug and incus info outputs below)

DEBUG


DEBUG  [2024-11-12T17:27:11-08:00] Connecting to a local Incus over a Unix socket 
DEBUG  [2024-11-12T17:27:11-08:00] Sending request to Incus                      etag= method=GET url="http://unix.socket/1.0"
DEBUG  [2024-11-12T17:27:11-08:00] Got response struct from Incus               
DEBUG  [2024-11-12T17:27:11-08:00] 
	{
		"config": {
			"cluster.https_address": "10.40.10.2:8443",
			"core.https_address": ":8443",
			"network.ovn.northbound_connection": "tcp:10.30.10.2:6641,tcp:10.30.10.3:6641,tcp:10.30.10.4:6641"
		},
		"api_extensions": [
			"storage_zfs_remove_snapshots",
			"container_host_shutdown_timeout",
			"container_stop_priority",
			"container_syscall_filtering",
			"auth_pki",
			"container_last_used_at",
			"etag",
			"patch",
			"usb_devices",
			"https_allowed_credentials",
			"image_compression_algorithm",
			"directory_manipulation",
			"container_cpu_time",
			"storage_zfs_use_refquota",
			"storage_lvm_mount_options",
			"network",
			"profile_usedby",
			"container_push",
			"container_exec_recording",
			"certificate_update",
			"container_exec_signal_handling",
			"gpu_devices",
			"container_image_properties",
			"migration_progress",
			"id_map",
			"network_firewall_filtering",
			"network_routes",
			"storage",
			"file_delete",
			"file_append",
			"network_dhcp_expiry",
			"storage_lvm_vg_rename",
			"storage_lvm_thinpool_rename",
			"network_vlan",
			"image_create_aliases",
			"container_stateless_copy",
			"container_only_migration",
			"storage_zfs_clone_copy",
			"unix_device_rename",
			"storage_lvm_use_thinpool",
			"storage_rsync_bwlimit",
			"network_vxlan_interface",
			"storage_btrfs_mount_options",
			"entity_description",
			"image_force_refresh",
			"storage_lvm_lv_resizing",
			"id_map_base",
			"file_symlinks",
			"container_push_target",
			"network_vlan_physical",
			"storage_images_delete",
			"container_edit_metadata",
			"container_snapshot_stateful_migration",
			"storage_driver_ceph",
			"storage_ceph_user_name",
			"resource_limits",
			"storage_volatile_initial_source",
			"storage_ceph_force_osd_reuse",
			"storage_block_filesystem_btrfs",
			"resources",
			"kernel_limits",
			"storage_api_volume_rename",
			"network_sriov",
			"console",
			"restrict_dev_incus",
			"migration_pre_copy",
			"infiniband",
			"dev_incus_events",
			"proxy",
			"network_dhcp_gateway",
			"file_get_symlink",
			"network_leases",
			"unix_device_hotplug",
			"storage_api_local_volume_handling",
			"operation_description",
			"clustering",
			"event_lifecycle",
			"storage_api_remote_volume_handling",
			"nvidia_runtime",
			"container_mount_propagation",
			"container_backup",
			"dev_incus_images",
			"container_local_cross_pool_handling",
			"proxy_unix",
			"proxy_udp",
			"clustering_join",
			"proxy_tcp_udp_multi_port_handling",
			"network_state",
			"proxy_unix_dac_properties",
			"container_protection_delete",
			"unix_priv_drop",
			"pprof_http",
			"proxy_haproxy_protocol",
			"network_hwaddr",
			"proxy_nat",
			"network_nat_order",
			"container_full",
			"backup_compression",
			"nvidia_runtime_config",
			"storage_api_volume_snapshots",
			"storage_unmapped",
			"projects",
			"network_vxlan_ttl",
			"container_incremental_copy",
			"usb_optional_vendorid",
			"snapshot_scheduling",
			"snapshot_schedule_aliases",
			"container_copy_project",
			"clustering_server_address",
			"clustering_image_replication",
			"container_protection_shift",
			"snapshot_expiry",
			"container_backup_override_pool",
			"snapshot_expiry_creation",
			"network_leases_location",
			"resources_cpu_socket",
			"resources_gpu",
			"resources_numa",
			"kernel_features",
			"id_map_current",
			"event_location",
			"storage_api_remote_volume_snapshots",
			"network_nat_address",
			"container_nic_routes",
			"cluster_internal_copy",
			"seccomp_notify",
			"lxc_features",
			"container_nic_ipvlan",
			"network_vlan_sriov",
			"storage_cephfs",
			"container_nic_ipfilter",
			"resources_v2",
			"container_exec_user_group_cwd",
			"container_syscall_intercept",
			"container_disk_shift",
			"storage_shifted",
			"resources_infiniband",
			"daemon_storage",
			"instances",
			"image_types",
			"resources_disk_sata",
			"clustering_roles",
			"images_expiry",
			"resources_network_firmware",
			"backup_compression_algorithm",
			"ceph_data_pool_name",
			"container_syscall_intercept_mount",
			"compression_squashfs",
			"container_raw_mount",
			"container_nic_routed",
			"container_syscall_intercept_mount_fuse",
			"container_disk_ceph",
			"virtual-machines",
			"image_profiles",
			"clustering_architecture",
			"resources_disk_id",
			"storage_lvm_stripes",
			"vm_boot_priority",
			"unix_hotplug_devices",
			"api_filtering",
			"instance_nic_network",
			"clustering_sizing",
			"firewall_driver",
			"projects_limits",
			"container_syscall_intercept_hugetlbfs",
			"limits_hugepages",
			"container_nic_routed_gateway",
			"projects_restrictions",
			"custom_volume_snapshot_expiry",
			"volume_snapshot_scheduling",
			"trust_ca_certificates",
			"snapshot_disk_usage",
			"clustering_edit_roles",
			"container_nic_routed_host_address",
			"container_nic_ipvlan_gateway",
			"resources_usb_pci",
			"resources_cpu_threads_numa",
			"resources_cpu_core_die",
			"api_os",
			"container_nic_routed_host_table",
			"container_nic_ipvlan_host_table",
			"container_nic_ipvlan_mode",
			"resources_system",
			"images_push_relay",
			"network_dns_search",
			"container_nic_routed_limits",
			"instance_nic_bridged_vlan",
			"network_state_bond_bridge",
			"usedby_consistency",
			"custom_block_volumes",
			"clustering_failure_domains",
			"resources_gpu_mdev",
			"console_vga_type",
			"projects_limits_disk",
			"network_type_macvlan",
			"network_type_sriov",
			"container_syscall_intercept_bpf_devices",
			"network_type_ovn",
			"projects_networks",
			"projects_networks_restricted_uplinks",
			"custom_volume_backup",
			"backup_override_name",
			"storage_rsync_compression",
			"network_type_physical",
			"network_ovn_external_subnets",
			"network_ovn_nat",
			"network_ovn_external_routes_remove",
			"tpm_device_type",
			"storage_zfs_clone_copy_rebase",
			"gpu_mdev",
			"resources_pci_iommu",
			"resources_network_usb",
			"resources_disk_address",
			"network_physical_ovn_ingress_mode",
			"network_ovn_dhcp",
			"network_physical_routes_anycast",
			"projects_limits_instances",
			"network_state_vlan",
			"instance_nic_bridged_port_isolation",
			"instance_bulk_state_change",
			"network_gvrp",
			"instance_pool_move",
			"gpu_sriov",
			"pci_device_type",
			"storage_volume_state",
			"network_acl",
			"migration_stateful",
			"disk_state_quota",
			"storage_ceph_features",
			"projects_compression",
			"projects_images_remote_cache_expiry",
			"certificate_project",
			"network_ovn_acl",
			"projects_images_auto_update",
			"projects_restricted_cluster_target",
			"images_default_architecture",
			"network_ovn_acl_defaults",
			"gpu_mig",
			"project_usage",
			"network_bridge_acl",
			"warnings",
			"projects_restricted_backups_and_snapshots",
			"clustering_join_token",
			"clustering_description",
			"server_trusted_proxy",
			"clustering_update_cert",
			"storage_api_project",
			"server_instance_driver_operational",
			"server_supported_storage_drivers",
			"event_lifecycle_requestor_address",
			"resources_gpu_usb",
			"clustering_evacuation",
			"network_ovn_nat_address",
			"network_bgp",
			"network_forward",
			"custom_volume_refresh",
			"network_counters_errors_dropped",
			"metrics",
			"image_source_project",
			"clustering_config",
			"network_peer",
			"linux_sysctl",
			"network_dns",
			"ovn_nic_acceleration",
			"certificate_self_renewal",
			"instance_project_move",
			"storage_volume_project_move",
			"cloud_init",
			"network_dns_nat",
			"database_leader",
			"instance_all_projects",
			"clustering_groups",
			"ceph_rbd_du",
			"instance_get_full",
			"qemu_metrics",
			"gpu_mig_uuid",
			"event_project",
			"clustering_evacuation_live",
			"instance_allow_inconsistent_copy",
			"network_state_ovn",
			"storage_volume_api_filtering",
			"image_restrictions",
			"storage_zfs_export",
			"network_dns_records",
			"storage_zfs_reserve_space",
			"network_acl_log",
			"storage_zfs_blocksize",
			"metrics_cpu_seconds",
			"instance_snapshot_never",
			"certificate_token",
			"instance_nic_routed_neighbor_probe",
			"event_hub",
			"agent_nic_config",
			"projects_restricted_intercept",
			"metrics_authentication",
			"images_target_project",
			"images_all_projects",
			"cluster_migration_inconsistent_copy",
			"cluster_ovn_chassis",
			"container_syscall_intercept_sched_setscheduler",
			"storage_lvm_thinpool_metadata_size",
			"storage_volume_state_total",
			"instance_file_head",
			"instances_nic_host_name",
			"image_copy_profile",
			"container_syscall_intercept_sysinfo",
			"clustering_evacuation_mode",
			"resources_pci_vpd",
			"qemu_raw_conf",
			"storage_cephfs_fscache",
			"network_load_balancer",
			"vsock_api",
			"instance_ready_state",
			"network_bgp_holdtime",
			"storage_volumes_all_projects",
			"metrics_memory_oom_total",
			"storage_buckets",
			"storage_buckets_create_credentials",
			"metrics_cpu_effective_total",
			"projects_networks_restricted_access",
			"storage_buckets_local",
			"loki",
			"acme",
			"internal_metrics",
			"cluster_join_token_expiry",
			"remote_token_expiry",
			"init_preseed",
			"storage_volumes_created_at",
			"cpu_hotplug",
			"projects_networks_zones",
			"network_txqueuelen",
			"cluster_member_state",
			"instances_placement_scriptlet",
			"storage_pool_source_wipe",
			"zfs_block_mode",
			"instance_generation_id",
			"disk_io_cache",
			"amd_sev",
			"storage_pool_loop_resize",
			"migration_vm_live",
			"ovn_nic_nesting",
			"oidc",
			"network_ovn_l3only",
			"ovn_nic_acceleration_vdpa",
			"cluster_healing",
			"instances_state_total",
			"auth_user",
			"security_csm",
			"instances_rebuild",
			"numa_cpu_placement",
			"custom_volume_iso",
			"network_allocations",
			"zfs_delegate",
			"storage_api_remote_volume_snapshot_copy",
			"operations_get_query_all_projects",
			"metadata_configuration",
			"syslog_socket",
			"event_lifecycle_name_and_project",
			"instances_nic_limits_priority",
			"disk_initial_volume_configuration",
			"operation_wait",
			"image_restriction_privileged",
			"cluster_internal_custom_volume_copy",
			"disk_io_bus",
			"storage_cephfs_create_missing",
			"instance_move_config",
			"ovn_ssl_config",
			"certificate_description",
			"disk_io_bus_virtio_blk",
			"loki_config_instance",
			"instance_create_start",
			"clustering_evacuation_stop_options",
			"boot_host_shutdown_action",
			"agent_config_drive",
			"network_state_ovn_lr",
			"image_template_permissions",
			"storage_bucket_backup",
			"storage_lvm_cluster",
			"shared_custom_block_volumes",
			"auth_tls_jwt",
			"oidc_claim",
			"device_usb_serial",
			"numa_cpu_balanced",
			"image_restriction_nesting",
			"network_integrations",
			"instance_memory_swap_bytes",
			"network_bridge_external_create",
			"network_zones_all_projects",
			"storage_zfs_vdev",
			"container_migration_stateful",
			"profiles_all_projects",
			"instances_scriptlet_get_instances",
			"instances_scriptlet_get_cluster_members",
			"instances_scriptlet_get_project",
			"network_acl_stateless",
			"instance_state_started_at",
			"networks_all_projects",
			"network_acls_all_projects",
			"storage_buckets_all_projects",
			"resources_load",
			"instance_access",
			"project_access",
			"projects_force_delete",
			"resources_cpu_flags",
			"disk_io_bus_cache_filesystem",
			"instances_lxcfs_per_instance",
			"disk_volume_subpath",
			"projects_limits_disk_pool",
			"network_ovn_isolated",
			"qemu_raw_qmp",
			"network_load_balancer_health_check",
			"oidc_scopes",
			"network_integrations_peer_name",
			"qemu_scriptlet",
			"instance_auto_restart",
			"storage_lvm_metadatasize",
			"ovn_nic_promiscuous",
			"ovn_nic_ip_address_none"
		],
		"api_status": "stable",
		"api_version": "1.0",
		"auth": "trusted",
		"public": false,
		"auth_methods": [
			"tls"
		],
		"auth_user_name": "root",
		"auth_user_method": "unix",
		"environment": {
			"addresses": [
				"10.10.20.2:8443",
				"10.10.20.153:8443",
				"10.20.10.2:8443",
				"10.30.10.2:8443",
				"10.40.10.2:8443"
			],
			"architectures": [
				"x86_64",
				"i686"
			],
			"certificate": "-----BEGIN CERTIFICATE-----\<SECRET CERT>\n-----END CERTIFICATE-----\n",
			"certificate_fingerprint": "c35463830c3080be47a2508160d71917519dc82d8ebd39dc88f62a9b8a7b37cb",
			"driver": "lxc | qemu",
			"driver_version": "6.0.2 | 9.0.2",
			"firewall": "nftables",
			"kernel": "Linux",
			"kernel_architecture": "x86_64",
			"kernel_features": {
				"idmapped_mounts": "true",
				"netnsid_getifaddrs": "true",
				"seccomp_listener": "true",
				"seccomp_listener_continue": "true",
				"uevent_injection": "true",
				"unpriv_binfmt": "false",
				"unpriv_fscaps": "true"
			},
			"kernel_version": "6.1.0-27-amd64",
			"lxc_features": {
				"cgroup2": "true",
				"core_scheduling": "true",
				"devpts_fd": "true",
				"idmapped_mounts_v2": "true",
				"mount_injection_file": "true",
				"network_gateway_device_route": "true",
				"network_ipvlan": "true",
				"network_l2proxy": "true",
				"network_phys_macvlan_mtu": "true",
				"network_veth_router": "true",
				"pidfd": "true",
				"seccomp_allow_deny_syntax": "true",
				"seccomp_notify": "true",
				"seccomp_proxy_send_notify_fd": "true"
			},
			"os_name": "Debian GNU/Linux",
			"os_version": "12",
			"project": "default",
			"server": "incus",
			"server_clustered": true,
			"server_event_mode": "full-mesh",
			"server_name": "gigabyte",
			"server_pid": 5432,
			"server_version": "6.0.2",
			"storage": "lvmcluster",
			"storage_version": "2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0",
			"storage_supported_drivers": [
				{
					"Name": "dir",
					"Version": "1",
					"Remote": false
				},
				{
					"Name": "lvm",
					"Version": "2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0",
					"Remote": false
				},
				{
					"Name": "lvmcluster",
					"Version": "2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0",
					"Remote": true
				}
			]
		}
	} 
Launching test-vm
DEBUG  [2024-11-12T17:27:11-08:00] Sending request to Incus                      etag= method=GET url="http://unix.socket/1.0/networks/my-ovn"
DEBUG  [2024-11-12T17:27:11-08:00] Got response struct from Incus               
DEBUG  [2024-11-12T17:27:11-08:00] 
	{
		"config": {
			"bridge.mtu": "1500",
			"ipv4.address": "10.184.221.1/24",
			"ipv4.nat": "true",
			"ipv6.address": "fd42:f747:9125:2059::1/64",
			"ipv6.nat": "true",
			"network": "UPLINK-bdebruhl",
			"volatile.network.ipv4.address": "10.50.10.2"
		},
		"description": "",
		"name": "my-ovn",
		"type": "ovn",
		"used_by": [],
		"managed": true,
		"status": "Created",
		"locations": [
			"r620",
			"gigabyte",
			"supermicro"
		],
		"project": "default"
	} 
DEBUG  [2024-11-12T17:27:11-08:00] Connecting to a remote simplestreams server   URL="https://images.linuxcontainers.org"
DEBUG  [2024-11-12T17:27:11-08:00] Connected to the websocket: ws://unix.socket/1.0/events 
DEBUG  [2024-11-12T17:27:11-08:00] Sending request to Incus                      etag= method=POST url="http://unix.socket/1.0/instances"
DEBUG  [2024-11-12T17:27:11-08:00] 
	{
		"architecture": "",
		"config": {},
		"devices": {
			"eth0": {
				"name": "eth0",
				"network": "my-ovn",
				"type": "nic"
			}
		},
		"ephemeral": false,
		"profiles": [
			"default"
		],
		"stateful": false,
		"description": "",
		"name": "test-vm",
		"source": {
			"type": "image",
			"certificate": "",
			"alias": "1d36502dd7f8",
			"server": "https://images.linuxcontainers.org",
			"protocol": "simplestreams",
			"mode": "pull",
			"allow_inconsistent": false
		},
		"instance_type": "",
		"type": "virtual-machine",
		"start": true
	} 
DEBUG  [2024-11-12T17:27:11-08:00] Got operation from Incus                     
DEBUG  [2024-11-12T17:27:11-08:00] 
	{
		"id": "20296f8e-8e97-4bf2-bdcf-c45c48ebcb67",
		"class": "task",
		"description": "Creating instance",
		"created_at": "2024-11-12T17:27:11.644498574-08:00",
		"updated_at": "2024-11-12T17:27:11.644498574-08:00",
		"status": "Running",
		"status_code": 103,
		"resources": {
			"instances": [
				"/1.0/instances/test-vm"
			]
		},
		"metadata": null,
		"may_cancel": false,
		"err": "",
		"location": "gigabyte"
	} 
DEBUG  [2024-11-12T17:27:11-08:00] Sending request to Incus                      etag= method=GET url="http://unix.socket/1.0/operations/20296f8e-8e97-4bf2-bdcf-c45c48ebcb67"
DEBUG  [2024-11-12T17:27:11-08:00] Got response struct from Incus               
DEBUG  [2024-11-12T17:27:11-08:00] 
	{
		"id": "20296f8e-8e97-4bf2-bdcf-c45c48ebcb67",
		"class": "task",
		"description": "Creating instance",
		"created_at": "2024-11-12T17:27:11.644498574-08:00",
		"updated_at": "2024-11-12T17:27:11.644498574-08:00",
		"status": "Running",
		"status_code": 103,
		"resources": {
			"instances": [
				"/1.0/instances/test-vm"
			]
		},
		"metadata": null,
		"may_cancel": false,
		"err": "",
		"location": "gigabyte"
	} 
DEBUG  [2024-11-12T17:27:23-08:00] Sending request to Incus                      etag= method=GET url="http://unix.socket/1.0/instances/test-vm"
DEBUG  [2024-11-12T17:27:23-08:00] Got response struct from Incus               
DEBUG  [2024-11-12T17:27:23-08:00] 
	{
		"architecture": "x86_64",
		"config": {
			"image.architecture": "amd64",
			"image.description": "Ubuntu noble amd64 (20241112_07:42)",
			"image.os": "Ubuntu",
			"image.release": "noble",
			"image.requirements.cgroup": "v2",
			"image.serial": "20241112_07:42",
			"image.type": "disk-kvm.img",
			"image.variant": "desktop",
			"volatile.base_image": "1d36502dd7f849e2f44b1ff65a6bb63ddf435a40c044899d349384c739908ce4",
			"volatile.cloud-init.instance-id": "085aa2bd-710a-45bf-9fdd-3d4e8a119e2f",
			"volatile.eth0.host_name": "tap85f85c3d",
			"volatile.eth0.hwaddr": "00:16:3e:74:7b:c6",
			"volatile.eth0.last_state.ip_addresses": "10.184.221.2,fd42:f747:9125:2059:216:3eff:fe74:7bc6",
			"volatile.last_state.power": "RUNNING",
			"volatile.uuid": "48e445d1-9101-4e0b-b677-e745c40409a4",
			"volatile.uuid.generation": "48e445d1-9101-4e0b-b677-e745c40409a4",
			"volatile.vsock_id": "3792657189"
		},
		"devices": {
			"eth0": {
				"name": "eth0",
				"network": "my-ovn",
				"type": "nic"
			}
		},
		"ephemeral": false,
		"profiles": [
			"default"
		],
		"stateful": false,
		"description": "",
		"created_at": "2024-11-13T01:27:14.915086181Z",
		"expanded_config": {
			"image.architecture": "amd64",
			"image.description": "Ubuntu noble amd64 (20241112_07:42)",
			"image.os": "Ubuntu",
			"image.release": "noble",
			"image.requirements.cgroup": "v2",
			"image.serial": "20241112_07:42",
			"image.type": "disk-kvm.img",
			"image.variant": "desktop",
			"volatile.base_image": "1d36502dd7f849e2f44b1ff65a6bb63ddf435a40c044899d349384c739908ce4",
			"volatile.cloud-init.instance-id": "085aa2bd-710a-45bf-9fdd-3d4e8a119e2f",
			"volatile.eth0.host_name": "tap85f85c3d",
			"volatile.eth0.hwaddr": "00:16:3e:74:7b:c6",
			"volatile.eth0.last_state.ip_addresses": "10.184.221.2,fd42:f747:9125:2059:216:3eff:fe74:7bc6",
			"volatile.last_state.power": "RUNNING",
			"volatile.uuid": "48e445d1-9101-4e0b-b677-e745c40409a4",
			"volatile.uuid.generation": "48e445d1-9101-4e0b-b677-e745c40409a4",
			"volatile.vsock_id": "3792657189"
		},
		"expanded_devices": {
			"eth0": {
				"name": "eth0",
				"network": "my-ovn",
				"type": "nic"
			},
			"root": {
				"path": "/",
				"pool": "remote",
				"type": "disk"
			}
		},
		"name": "test-vm",
		"status": "Running",
		"status_code": 103,
		"last_used_at": "2024-11-13T01:27:23.597071346Z",
		"location": "gigabyte",
		"type": "virtual-machine",
		"project": "default"
	}

Incus info output

config:
  cluster.https_address: 10.40.10.2:8443
  core.https_address: :8443
  network.ovn.northbound_connection: tcp:10.30.10.2:6641,tcp:10.30.10.3:6641,tcp:10.30.10.4:6641
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instances_lxcfs_per_instance
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
  addresses:
  - 10.10.20.2:8443
  - 10.10.20.153:8443
  - 10.20.10.2:8443
  - 10.30.10.2:8443
  - 10.40.10.2:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MII.......
    -----END CERTIFICATE-----
  certificate_fingerprint: c35463830c308.........
  driver: lxc | qemu
  driver_version: 6.0.2 | 9.0.2
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "false"
    unpriv_fscaps: "true"
  kernel_version: 6.1.0-27-amd64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "12"
  project: default
  server: incus
  server_clustered: true
  server_event_mode: full-mesh
  server_name: gigabyte
  server_pid: 5432
  server_version: 6.0.2
  storage: lvmcluster
  storage_version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0
    remote: false
  - name: lvmcluster
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0
    remote: true
@stgraber stgraber added the Bug Confirmed to be a bug label Nov 14, 2024
@stgraber stgraber added this to the incus-6.8 milestone Nov 14, 2024
@stgraber stgraber self-assigned this Nov 26, 2024
@stgraber
Copy link
Member

I spent a couple of hours trying to find a good way to handle this, in short, there isn't one...
Our images are all built using 512 bytes physical block sizes. When those are written to a 4k physical block size device, the partition table makes no sense and leads to an unbootable VM.

I tried a variety of QEMU options to try to have it consume a 4k block device on the host and pretend that it's 512 bytes aligned in the guest, but that doesn't appear to be working, possibly because of DirectIO.

So I ended up having to go for the big hammer and put in a check which will cause image unpacks on non-512 bytes devices to fail.

@glingy
Copy link

glingy commented Dec 17, 2024

@stgraber I happen to be trying to run Incus on an nvme drive that only has 4096-byte sector size as an option, 512 isn't shown in the nvme list command shown in the forum link above, so I can't easily configure it to use 512 sector sizes. I went down several rabbit holes, considered putting my LVM physical volume in a loop device pointing to the physical partition because I can set the sector size on a loop device to whatever I want, but wanted a real solution. I finally discovered that if I delete /usr/bin/sgdisk from my host system, grab a fresh image directly from the linux containers image registry, and spin up the VM it just magically works.

Running gdisk inside the vm shows that even though the physical disks have 4096 sector size, the virtual disk (virtio-scsi at least) makes it appear in qemu as 512 sector size and all's happy:
Screenshot 2024-12-16 at 9 08 33 PM

After more digging, it appears that this line is the culprit:

_, err = subprocess.RunCommand(path, "--move-second-header", devPath)
. It appears that the images are meant for 512, and QEMU emulates 512, but since the backing storage is 4096 according to the host, when sgdisk is run on the host (where the block device is 4096) to move the second GPT header, it finds a valid protective MBR with no GPT header and corrupts the entire partition table.

I haven't tried it yet, but it looks like one solution might be to use losetup to create a temporary loop device with 512 sector size, then have sgdisk act on the loop device, then delete the loop device might be the workflow needed to make this work.

@glingy
Copy link

glingy commented Dec 17, 2024

Update, yep this is definitely it. With deleting sgdisk completely, there is a warning during boot that the GPT partition table isn't quite right. If I instead move /usr/bin/sgdisk to /root/sgdisk (for testing) and put this shell script in /usr/bin/sgdisk instead:

#!/bin/sh
LOOP_DEV=`losetup -P -b 512 -f $2 --show`
/root/sgdisk $1 $LOOP_DEV
losetup -d $LOOP_DEV

Then create an instance from an image I do not currently have cached or instantiated, I get a perfect boot with no GPT errors with an LVM on a 4096-byte sector disk. Now I have absolutely no clue how to go about patching this into incus, unfortunately sgdisk does not have an option to forcibly override the sector size read from the disk for a single operation that I can find.

@stgraber stgraber reopened this Dec 17, 2024
@stgraber
Copy link
Member

Ah, interesting, we should investigate if we can't just force sgdisk into 512 bytes mode somehow.

@stgraber stgraber modified the milestones: incus-6.8, incus-6.9 Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Development

Successfully merging a pull request may close this issue.

3 participants