Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devise a test to force failure to destroy one of several devices in a pool #232

Open
mulkieran opened this issue Dec 17, 2020 · 7 comments
Assignees

Comments

@mulkieran
Copy link
Member

What we want to do is engineer a situation where when we instruct stratisd to destroy a pool, it fails to wipe the data from at least one, but not all, of the block devices that it owns. Any cause of this problem will do.

See https://bugzilla.redhat.com/show_bug.cgi?id=1908333 for the motivation.

@mulkieran mulkieran assigned mulkieran and bgurney-rh and unassigned mulkieran Dec 17, 2020
@bgurney-rh
Copy link
Member

I have an idea that involves setting up a target on a "linear" device-mapper device, and then reloading that device to "error", and watching the "pool destroy" command fail.

(My example backing device for the device-mapper linear test device is /dev/vdb1, or 252:17, which is 16775168 sectors in size.)

# dmsetup create removetest --table '0 16775168 linear 252:17 0'
# stratis pool create spool1 /dev/mapper/removetest
# dmsetup suspend removetest && dmsetup reload removetest --table '0 16775168 error' && dmsetup resume removetest

At this point is where the error appears:

# stratis --propagate pool destroy spool1
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/stratis_cli/_main.py", line 43, in the_func
    result.func(result)
  File "/usr/lib/python3.8/site-packages/stratis_cli/_actions/_top.py", line 299, in destroy_pool
    raise StratisCliEngineError(rc, message)
stratis_cli._errors.StratisCliEngineError: ERROR: Failed to wipe already initialized devnodes: ["/dev/mapper/removetest"]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/stratis", line 36, in <module>
    main()
  File "/usr/bin/stratis", line 32, in main
    return run()(sys.argv[1:])
  File "/usr/lib/python3.8/site-packages/stratis_cli/_main.py", line 60, in the_func
    raise StratisCliActionError(command_line_args, result) from err
stratis_cli._errors.StratisCliActionError: Action selected by command-line arguments ['--propagate', 'pool', 'destroy', 'spool1'] which were parsed to Namespace(func=<function TopActions.destroy_pool at 0x7f30e2c61af0>, pool_name='spool1', propagate=True) failed

And, in this state, the stratis pool device stack disappears, because all I/O to the block device is failing, until you reload the table entry to its original "linear" setting:

# dmsetup suspend removetest && dmsetup reload removetest --table '0 16775168 linear 252:17 0' && dmsetup resume removetest

...and you have to restart stratisd, to allow it to start the pool.

@mulkieran
Copy link
Member Author

We might want something that is less permanent than error...we can't tell that the original bz wasn't a transient problem.

@bgurney-rh
Copy link
Member

I can get it to fail with a dm-dust device. My test virtual machine has a test device /dev/vdb1 which is 16775168 sectors in size. Assuming that a key has been created in key description "testkey":

# modprobe dm_dust
# dmsetup create dust1 --table '0 16775168 dust /dev/vdb1 0 512'
# stratis pool create --key-desc testkey spool1 /dev/mapper/dust1
# dmsetup message dust1 0 addbadblock 0 16

kernel: device-mapper: dust: dust_add_block: badblock added at block 0 with write fail count 16

# dmsetup message dust1 0 enable

kernel: device-mapper: dust: enabling read failures on bad sectors

# stratis --propagate pool destroy spool1
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/stratis_cli/_main.py", line 43, in the_func
    result.func(result)
  File "/usr/lib/python3.8/site-packages/stratis_cli/_parser/_parser.py", line 87, in wrapped_func
    func(*args)
  File "/usr/lib/python3.8/site-packages/stratis_cli/_actions/_top.py", line 486, in destroy_pool
    raise StratisCliEngineError(return_code, message)
stratis_cli._errors.StratisCliEngineError: ERROR: Engine error: Failed to wipe already initialized devnodes; Failed to wipe blockdev /dev/mapper/dust1: Cryptsetup error: IO error occurred: Input/output error (os error 5)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/stratis", line 35, in <module>
    main()
  File "/usr/bin/stratis", line 32, in main
    return run()(sys.argv[1:])
  File "/usr/lib/python3.8/site-packages/stratis_cli/_main.py", line 60, in the_func
    raise StratisCliActionError(command_line_args, result) from err
stratis_cli._errors.StratisCliActionError: Action selected by command-line arguments ['--propagate', 'pool', 'destroy', 'spool1'] which were parsed to Namespace(func=<function add_subcommand.<locals>.wrap_func.<locals>.wrapped_func at 0x7fddfc88be50>, pool_name='spool1', propagate=True) failed

And there are some errors from the kernel, since there was a buffered I/O error on "logical block 0" of the device.

I had to add more than just 1 write failure, in order to get the pool destroy to fail. I think it was more than 4; 16 seemed to work.

(With dm-dust, using the default of "0 write failures" for a bad block, a write would allow subsequent reads to succeed, but in this case, we want to fail the write for a limited number of times, but not all the time.)

@mulkieran
Copy link
Member Author

After this failure, we would expect to still see the pool when listing pools in the CLI at this time (we plan to fix that). We shouldn't see the pool stack, though, as stratisd should have destroyed all the upper layers.

@bgurney-rh
Copy link
Member

Confirmed; in this state, I can still see the pool when entering "stratis pool list", but I don't see the device stack.

However, I do see the LUKS superblocks on the device /dev/mapper/dust1. (One of them is sector 0, which is what I set to fail, so perhaps it wasn't able to wipe the sectors at 0x0000 bytes and 0x4000 bytes.

# hexdump -C -s 0 -n 32 /dev/mapper/dust1
00000000  4c 55 4b 53 ba be 00 02  00 00 00 00 00 00 40 00  |LUKS..........@.|
00000010  00 00 00 00 00 00 00 06  00 00 00 00 00 00 00 00  |................|
00000020

# hexdump -C -s 16384 -n 32 /dev/mapper/dust1
00004000  53 4b 55 4c ba be 00 02  00 00 00 00 00 00 40 00  |SKUL..........@.|
00004010  00 00 00 00 00 00 00 07  00 00 00 00 00 00 00 00  |................|
00004020

If I restart stratisd, the pool will not appear in "stratis pool list".

@mulkieran
Copy link
Member Author

Good!

@mulkieran mulkieran transferred this issue from stratis-storage/project Oct 13, 2021
@mulkieran
Copy link
Member Author

@bgurney-rh Can you include this setup in your testing notebook when you get the chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants