Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVM fails if exists in the project a PVC that consumes more than 50% of project quota. #1215

Open
JoaoBraveCoding opened this issue Oct 12, 2021 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@JoaoBraveCoding
Copy link

Describe the bug

DVM fails if we try for instance to stage a second time a project that contains a PVC whom size costumes more than 50% of the project quota.

To Reproduce
Steps to reproduce the behavior:

  1. Create project with quota to 100Gb
  2. Create PVC of 60Gb
  3. Stage the project for migration
  4. Stage the project for migration a second time

Expected behavior

Staging should happen a second time without DMV failing.

Screenshots & Snippets

image

Additional context
We are running MIG operator version 1.5.0. Since I cannot find release notes, I'm not sure if the problem has been addressed in more recent releases.

Log line from oc logs migration-log-reader-657486d85d-mbtd9 -c plain -n openshift-migration| grep '"dvm":"edms-search-dev-staging-23788-8qfgf"'

openshift-migration migration-controller-5ffdb47b68-w9gc2 mtc {"level":"info","ts":1634046512.042396,"logger":"directvolume","msg":"","dvm":"edms-search-dev-staging-23788-8qfgf","migMigration":"edms-search-dev-staging-23788","error":"persistentvolumeclaims \"records-files-claim\" is forbidden: exceeded quota: edms-search-dev, requested: requests.storage=2Ti, used: requests.storage=2Ti, limited: requests.storage=2Ti","stacktrace":"\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*Task).Run()\n\t/opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/task.go:249\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).migrate()\n\t/opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/migrate.go:39\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).Reconcile()\n\t/opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/directvolumemigration_controller.go:144\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1()\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()\n\t/opt/app-root/src/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()\n\t/opt/app-root/src/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil()\n\t/opt/app-root/src/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil()\n\t/opt/app-root/src/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext()\n\t/opt/app-root/src/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext()\n\t/opt/app-root/src/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1373"}
@JoaoBraveCoding JoaoBraveCoding added the kind/bug Categorizes issue or PR as related to a bug. label Oct 12, 2021
@alaypatel07
Copy link
Contributor

Thanks for the report. This is an open issue that we need to handle.

For anyone curious about the root cause, here is why this is happening:

  1. DVM controller tries to create the PVC object on the destination cluster
  2. If the PVC object already exists, the api-server will return AlreadyExists error
  3. Although when the quota is in place, the api-server will try to validate this incoming create request first. The quota error will be hit first and instead of returning AlreadyExists the apiserver returns forbidden: exceeded quota
  4. The DVM controller is not wired to handle any error apart from AlreadyExists and it fails as reported here.

Workaround:

  1. Lift the quota temporarily.
  2. Delete the pvc's in the destination. If the quota is enough for all the PVCs, the new PVCs will be created and copied over. Of course this will lead to all the data being copied again, this can be an option if lifting the quota is not an option.

Bugfix proposal:

Instead of depending on apiserver error to assert if the PVC exists, DVM controller needs to make an explicit get call to see if the PVC exists. The part of error handling that is leading to this error is here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants