Use dependency chain depth as a secondary topological sort key #293
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds code to add heuristics to speed up catkin builds.
The background - our code has a several dependency chains, and due to alphabetical sorting, some of the prereqs don't get built until late in the list of packages. This often leads to the case where these prereqs are the only builds running. Unrelated packages are done and the remaining packages depend on the one package building. This hurts parallelism and slows down our builds.
Our case might be unusual since we're building natively on embedded hardware, so we're probably not a normal use case. With this PR we save about 1.5 minutes on a clean build, or ~15%. The PR should help when running on faster hardwae even if the magnitude of savings won't be as impressive.
The PR adds additional sort criteria to the topological sort. The primary is still correctness w.r.t. dependencies. The new criteria are used each iteration of the sort to pick from the list of packages which have their dependencies satisfied.
First off, it calculates the length of the longest dependency chain between a given package and all of the paths which lead to leaf nodes in the tree (the packages which are not prereqs for anything). The package with the longest set of dependent packages between itself and its leaf nodes is picked as the highest priority to run next. The idea here is that it makes sense to build a package at the start of a long chain of packages as soon as possible.
The tiebreaker among packages with equal max dependency chain depth is a count of how many packages depend on the current package. The goal here is that finishing a package which removes dependencies adds more opportunity for parallelism in the build.
Neither of these are guaranteed to be optimal but they seem like reasonable heuristics.
I looked and didn't see any other users of the topological sort functions so I don't think this will change the behavior of anything besides catkin build. Let me know if that's wrong.
Also, apologies in advance for the "translating C++ code into Python in my head" approach to coding and welcome feedback there.