-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create research-branch-policy.md #112
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,57 @@ | ||||||
# Research Branch Policy for InstructLab | ||||||
|
||||||
This document proposes a research branch policy for InstructLab repositories (sdg, eval, training) in the InstructLab organization. | ||||||
|
||||||
## Background | ||||||
|
||||||
Research engineers contributing to InstructLab have expressed the need for the ability to make rapid, experimental changes in a space where they can collaborate and move quickly without working through robust, production-level code review. | ||||||
Check failure on line 7 in docs/research-branch-policy.md GitHub Actions / markdown-lintTrailing spaces
|
||||||
Software engineers contributing to InstructLab have expressed concern about allowing code into the `main` branch that does not meet the production-level quality standards enforced on main. We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment. | ||||||
|
||||||
One of the more unique considerations InstructLab has as a project in contrast to other open source projects is the tight collaboration between research engineers and software engineers. There are cultural differences and different best practices in each discipline, and in order to enable this partnership we may need to take some new or less standard approaches to open source project management. | ||||||
|
||||||
### Current temporary solution | ||||||
|
||||||
A temporary working solution to enable research engineers to make forward progress is creating a `research` branch in a fork of the InstructLab repository under a user account. This enables the researchers to move forward rapidly and experiment in a collaborative manner. This will also necessitate a more involved merge process once the branch is ready to merge into the `main` branch of the `instructlab` repo under the InstructLab GitHub organization. That is a known and accepted tradeoff of not doing continuous production-level code review. | ||||||
|
||||||
### Proposed long-term solution | ||||||
|
||||||
Longer-term, however, the researchers have expressed concern that having this `research` repository in a user's fork makes it more difficult for would-be research-oriented contributors to discover. They would like to have a `research` branch (akin to the common practice of having a `devel` branch) on the main `instructlab` repositories such that their work is discoverable in addition to not negatively impacting the corresponding production-quality `main` branch. | ||||||
Check failure on line 18 in docs/research-branch-policy.md GitHub Actions / markdown-lintTrailing spaces
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please share one or more projects that use a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://nvie.com/posts/a-successful-git-branching-model/ any project that uses the gitflow branching strategy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we need an Having "research" branches in the main repos does not make them discoverable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bjhargrave I actually really like this idea! what are the main downsides? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only downside I can think of is having to manage another GH organization. But that is also a feature. We probably need to have different maintainers for production work than for research work. Using branches in a single repo makes things complex (probably requires CODEOWNERS which vary by branch which is a PITA). With a separate GH org, you can manage the teams and their application to research repos in a more straightforward and common manner perhaps with more relaxed rules for who can be a committer in research repos than you would want for production repos. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having a separate branch (e.g. This is a good alternative @bjhargrave from using |
||||||
|
||||||
### Concern about a specialized branch in main | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What concerns me most about this proposal is that while researchers work on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now we have a lot of pain the researchers are taking on with continuous code reviews causing friction and delay. The status quo isn't ok. Any ideas on alternatives to address? Noting that researchers' role is not to produce production-ready code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we can pair up researchers with software engs so they can embrace software eng practice a little more? I don't think both practices (research and soft eng) are mutually exclusive. Additionally, pushing small changes instead of a large amount of code helps get faster reviews and merges. Rapid iterations do not mean low-quality code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sdg pr #72 is an example of where this isn't happening as advertised There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That’s not a fair response, @mairin. That review was given priority attention and good faith responses trying to find the best solution quickly. By the end of the day, I took on that role of helping get it shape for main by filing two new PRs incorporating the important fix in one, and the desired feature in another incorporating review feedback and adding automated testing. This all happened within a day. What more do you want? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The feedback received regarding the cited PR is valid as well as what you are saying is valid. The problem is we have two different ways of working that we need to accomodate, and we need to find an aproach that accomodates all. Thus I filed this devdoc in an attempt to follow the process for working out such processes, so it could be discussed. I do think my response was fair. It was not made in bad faith. More, to point out, there are two proposed approaches - the research approach and the SWE approach. Neither taken as typically practiced is going to address the concerns of the other. So I am trying to counter the notion that following the standard SWE PR review process on small PRs is not going to address the concerns of the research engineers and I am citing a specific PR for more information. It was not meant to be unfair or a point of criticism. It was meant to point out the problem that needs to be solved. |
||||||
|
||||||
The software engineers have expressed concern that having a non-standard branch like this on the main upstream repository and not in a fork is against standard upstream open source practice: | ||||||
|
||||||
- It creates social strata between contributors who have the permissions to create a branch in the main repo vs contributors who can only create a branch in a user fork. "Official branches in the repo, all development in forks." | ||||||
- PRs for both the `main` (production-quality) and `research` (rapid and experimental) will live in the same repository. This might cause confusion for PR review workflows - it may not be clear to the PR reviewer which branch the PR is destined for clearly depending on the tooling being used. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a valid concern that can be reduced by applying labels to PRs once they are pushed to |
||||||
|
||||||
# Git branch strategies | ||||||
Check failure on line 27 in docs/research-branch-policy.md GitHub Actions / markdown-lintMultiple top-level headings in the same document
|
||||||
|
||||||
There are different strategies around git branching: | ||||||
|
||||||
- [Release-based branching](https://medium.com/@pooyagohardani/git-release-candidate-branches-strategy-balancing-quality-and-agility-in-software-development-21cc00842e03) | ||||||
- [Feature branching](https://martinfowler.com/articles/branching-patterns.html) | ||||||
- [Git flow](https://nvie.com/posts/a-successful-git-branching-model/) | ||||||
- [GitHub flow](https://docs.github.com/en/get-started/using-github/github-flow) | ||||||
- [Task branching](https://unity.com/how-to/devops-task-branch-workflow) | ||||||
- [Trunk-based development](https://trunkbaseddevelopment.com/) | ||||||
- Probably others! | ||||||
|
||||||
(More information on some of these including a comparison chart is available in [Git Branching Strategies for DevOps: Best Practices for Collaboration](https://dev.to/angelotheman/git-branching-strategies-for-devops-best-practices-for-collaboration-35l8) by Angel Oduro-Temeng Twumasi on dev.to.) | ||||||
|
||||||
The CLI repository [instructlab/instructlab](https://github.com/instructlab/instructlab) has a [Release Strategy devdoc](https://github.com/instructlab/instructlab/blob/main/docs/release-strategy.md) that dictates a release-based branching strategy. We do not have defined branching strategies for the other repos. | ||||||
|
||||||
We should probably decide on explicit branching policies for each of the core repositories in the `instructlab.` This will set clear expectations for our ways of working as a community project. | ||||||
|
||||||
# Research branch policy proposal | ||||||
Check failure on line 45 in docs/research-branch-policy.md GitHub Actions / markdown-lintMultiple top-level headings in the same document
|
||||||
|
||||||
Regardless of the branching strategy of the repo, we propose allowing a `release` branch on each of the core `instructlab`. This branch would not be subject to the same PR review guidelines as the rest of the repository. The goal of this branch is to enable rapid, multi-person collaborative development to get the code to a state that can then go through a more stringent code review process that adheres to the project's standards: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Did you mean |
||||||
|
||||||
- The `research` branch in each repository that has one will have named maintainers. Maintainer membership will be based on [the InstructLab maintainer policy already in place](https://github.com/instructlab/community/blob/main/GOVERNANCE.md#project-maintainers-overview). | ||||||
- Research-oriented PRs must be made against the `research` branch. | ||||||
- `research` branch maintainers will have the freedom to determine their PR merge policy. It will be documented in their branch's README or another easily-discoverable location of their choosing. | ||||||
- (Need to address CI in this policy. Not sure how or what is fair.) | ||||||
- When the `research` branch is in a state the maintainers are confident is good enough to attempt merging to main, they will coordinate with the repository core maintainers to start a review process. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes me very nervous. I'm concerned that we'll end up with an enormous amount of low-quality code being proposed to main through PRs. How can we ensure this doesn't happen? Ideally, we will need small batches instead of trying to merge the entire I think this is a very tedious process because we will likely not take every commit from I'd like to avoid a situation where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If i said s/research/devel on the name of the branch, would that change anything for you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, same problem. |
||||||
- The responsibility for completing the merge process by responding to review feedback is on the software engineer maintainers of the `research` branch and is not the responsibility of the research engineer maintainers. The research engineer maintainers will assist the software engineers responsible for the merge, however. | ||||||
- The merge code review for the `research` branch will adhere to the review requirements of the repository and follow the standard PR process already documented. | ||||||
|
||||||
The creation of this policy is in acknowledgement of the different ways of working inherent to the research engineer role in contrast to the production-level software engineer role. As a project, InstructLab has a goal of enabling upstream, open source collaboration between AI researchers, and this is one of the ways we can help achieve that goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the current proposal "minimizes the time from code merged into main to shipping in a release"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't actually impact it since the proposal only impacts the process previous t the merge to main. Just trying to fairly reflect the various positions and values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If thats the case then probably should remove the sentence: "We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment.".