Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change log generator misses some changes #107

Open
jgleissner opened this issue Sep 28, 2022 · 10 comments
Open

Change log generator misses some changes #107

jgleissner opened this issue Sep 28, 2022 · 10 comments
Assignees

Comments

@jgleissner
Copy link
Collaborator

Keg source tracking stores the source information of each key in the internal dictionary which can be used to generated a change log from the commit info. Not all changes are caught though. If a key was changed from e.g.:

some_key:
  - foo
  - bar
  - baz

to

some_key:
  - foo
  - bar

the source info generated by keg for some_key would be line 1 to 3 for whatever file some_key was imported from. The only line that changed was line 4 though, so querying the git commit history for changes in line 1-3 would not return the commit, and hence no change log entry is generated for that change.

@jgleissner jgleissner self-assigned this Sep 28, 2022
@gmoro
Copy link

gmoro commented Feb 10, 2023

This is an important bug to have fixed as it is a bit disrupting for the workflow we are envisaging for ALP and Minimal-VM.
As a work around we can use "force" in the OBS service to regenerate the kiwi files, but this still will live the changelog with the wrong entries, and this will impact the future feature that we had planned of generating .changes files from the keg changelogs.

@jgleissner
Copy link
Collaborator Author

There shouldn't be any wrong entries, just missing ones. Since the composed image description may have many different source fragments from different source files, keg tracks from which source files and lines keys where imported that end up in the final dictionary and writes them to a file. The change log generator uses that information to generate the log (or logs specific to profiles in case of multi-build). Simply using the commit message of all involved files may include unrelated commits, hence the source tracking by keg.

However, as stated, there are some cases that slip through, i.e. when a key value was truncated as in the above sample, or removed entirely. Keg cannot know from looking at the current state that there used to be a key in a previous state. You could work around this by replacing removed trailing lines in keys with empty lines instead of completely cutting them. Removed keys could be replaced by some_key: Null instead of cutting them. It's not great, admittedly.

Fixing this for good is tricky. Essentially we would need to check out the previous version from the recipes repo, build the image dictionary with keg and dump it somehow, then generate the image dictionary of the current version and look for deletions and somehow figure out which commits removed them. This would make change log generation significantly more complicated than it already is. I'll try to look into this some time, but I don't see a quick solution.

@jgleissner
Copy link
Collaborator Author

I've spent some time pondering a few approaches and I don't see a good path forward to dealing with those missed changes. To reiterate, the change log generator does not detect removed entries at the end of a key, e.g. the removal of the last package from a list of packages in a namespace, or the removal of a key. We could rebuild the previous image dictionary and find drop entries and keys, but finding the commits in the commit history that deleted them is tricky. You can ask git for commits matching a string or a regex in a file, but this could return false positives or only part of the relevant commit history, and as such couldn't be considered reliable. I don't think it's worth spending the effort to implement it.

When the keg compose_kiwi_description service thinks a newly generated image description has no changes compared to the existing one even though it may have some is a bit of a nuisance to deal with manually, but once we have an automated way to refresh image descriptions in the build service this becomes a more serious issue.

So I suggest the following approach:

We generally try not to make changes in keg-recipes that would slip by the change log generator. That means when removing a list entry at the end of a key, we replace it with a newline or better a comment. That means

_namespace_foo:
  package:
    - pkg1
    - pkg2
    - pkg_to_remove

becomes

_namespace_foo:
  package:
    - pkg1
    - pkg2
    # removed; comment for source tracking

And instead of simply removing keys, they get set to Null:

_namespace_foo: Null  # removed; comment for source tracking

This way the removed lines are still captured by the source tracker and the change log generator will find the appropriate commits.

There may still be commits slipping through that don't adhere to this. To address this, I suggest to implement additional checks in the compose_kiwi_description service that compare the generated files with the existing versions in case no changes were detected, to avoid running into the situation where we miss a description refresh.

@rjschwei What do you think?

@rjschwei
Copy link
Contributor

I am not a fan of such conventions. In #148 a different solution emerged :D

@jgleissner
Copy link
Collaborator Author

#148 only makes sure that the image description is regenerated if there are any changes, but it still may miss change log entries as described above. Those conventions would be in place to prevent that. It's not ideal because it is easily overlooked, but I think it's the most pragmatic approach.

@rjschwei
Copy link
Contributor

I don't quite understand. We look for file changes to generate the image description. Once we have the files that actually change we can also know what changed, comprehending a + or - in a diff should not put us in front of insurmountable challenges.

@jgleissner
Copy link
Collaborator Author

It's trivial to figure out what has changed from comparing the generated to the existing files, but that doesn't give you the information which commits caused the changes for generating the change log. The change log generator operates on the source tracking information that keg logs. Keg keeps track of where each line in the dictionary it constructs originated from. With that information the change log generator can collect all commits that affected all source lines of a profile. But as said, in case a key was removed or truncated, there is nothing in the source log that would point to the commit that removed it, as it's simply not there. So unless that commit also changed another part of the profile, the generated change log will lack that commit.

It's a bit of a corner case and can be avoided as described, and with the latest update compose_kiwi_description doesn't rely on the change log generator anymore to determine whether the image description has changed, so it should always refresh the image description if it has changed. There may be situations where the change log is incomplete.

@rjschwei
Copy link
Contributor

Sorry for being dense about this, but I still do not understand. Let's construct a hypothetical example in the hopes it will help me along.

Lets say I have a yaml file that contains

some_key:
    - foo
    - bar
    - baz

Form your original example and that was in the commit with sha 12345. This is used in the image generation. now in the next commit we end up with

some_key:
    - foo
    - bar

so there's a new sha for this commit, say sha 12346. Since we already know that the file that contained this information has changed I can git diff the commits. In the diff baz will be marked as removed. And if there is another commit that turns it into

some_key:
    - foo
    - bar
    - beer
    - and
    - wine

then the git diff will show me that baz has been removed and beer, and and wine have been added. So based on the commit IDs I can relate commit logs and diffs and with the diff I know whether something has been added or removed.

What am I missing?

@jgleissner
Copy link
Collaborator Author

Keg doesn't inspect differences between yaml files. To generate a change log for a given image description, or rather profile of an image description, as profiles are separate images in the end, keg keeps track of the files and line numbers each key in the image dictionary originated from and writes that to a log. The change log generator reads that information and queries git for all commits that affected each dictionary key since the last image description refresh. So if we have a file data/some_file.yaml with the content

some_key:
   - foo
   - bar

it will ask git for commits that affected lines 1-3 in data/some_file.yaml. This works well for changed and added lines, but not for removed lines, not all of them anyway (as described above).

Keg could theoretically look at differences between the old and new image dictionary, and sift through the commit diffs to figure out which commits match the changes, but I think this would be tricky to do, possibly error prone, and IMHO not really worth the effort considering that the missed deletion situation is somewhat of a corner case, and can be avoided.

@jgleissner
Copy link
Collaborator Author

I've bashed my head against this for a while and tried a few things. Diffing the image dictionaries is tricky, because there are keys within lists and comparing lists doesn't really work well.

However, there might be another approach. Keg produces logs of all source lines used in the dictionary for each profile. The source logs could be recreated by compose_image_description (as we capture the commit ID that was used for the previous image description). The changelog generator could use that information to find lines that were included in the old image description but are absent in the new image description. 'git blame --reverse' can be used to find the commit when a given line was present last. From there it should be possible to work out the next commit, which naturally should be the one that removed the line, and include the commit message in the change log. It may be a bit cumbersome but I think it should work.

What do you think @rjschwei ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants