-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create research-branch-policy.md #112
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Máirín Duffy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there any alternative considered other than using a research
branch?
|
||
# Research branch policy proposal | ||
|
||
Regardless of the branching strategy of the repo, we propose allowing a `release` branch on each of the core `instructlab`. This branch would not be subject to the same PR review guidelines as the rest of the repository. The goal of this branch is to enable rapid, multi-person collaborative development to get the code to a state that can then go through a more stringent code review process that adheres to the project's standards: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the branching strategy of the repo, we propose allowing a `release` branch on each of the core `instructlab`. This branch would not be subject to the same PR review guidelines as the rest of the repository. The goal of this branch is to enable rapid, multi-person collaborative development to get the code to a state that can then go through a more stringent code review process that adheres to the project's standards: | |
Regardless of the branching strategy of the repo, we propose allowing a `research` branch on each of the core `instructlab`. This branch would not be subject to the same PR review guidelines as the rest of the repository. The goal of this branch is to enable rapid, multi-person collaborative development to get the code to a state that can then go through a more stringent code review process that adheres to the project's standards: |
Did you mean research
?
- Research-oriented PRs must be made against the `research` branch. | ||
- `research` branch maintainers will have the freedom to determine their PR merge policy. It will be documented in their branch's README or another easily-discoverable location of their choosing. | ||
- (Need to address CI in this policy. Not sure how or what is fair.) | ||
- When the `research` branch is in a state the maintainers are confident is good enough to attempt merging to main, they will coordinate with the repository core maintainers to start a review process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me very nervous. I'm concerned that we'll end up with an enormous amount of low-quality code being proposed to main through PRs. How can we ensure this doesn't happen? Ideally, we will need small batches instead of trying to merge the entire research
branch in main
.
I think this is a very tedious process because we will likely not take every commit from research
and push it into main
. Some of that history will need to be rewritten to meet main
quality standards. This sounds like a significant amount of effort for whoever will be responsible for pushing code from research
into main
.
I'd like to avoid a situation where research
moves so quickly that merging into main
cannot keep up without lowering the quality standards of main
, thereby putting pressure on main
to merge more rapidly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i said s/research/devel on the name of the branch, would that change anything for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, same problem.
## Background | ||
|
||
Research engineers contributing to InstructLab have expressed the need for the ability to make rapid, experimental changes in a space where they can collaborate and move quickly without working through robust, production-level code review. | ||
Software engineers contributing to InstructLab have expressed concern about allowing code into the `main` branch that does not meet the production-level quality standards enforced on main. We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the current proposal "minimizes the time from code merged into main to shipping in a release"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't actually impact it since the proposal only impacts the process previous t the merge to main. Just trying to fairly reflect the various positions and values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If thats the case then probably should remove the sentence: "We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment.".
The software engineers have expressed concern that having a non-standard branch like this on the main upstream repository and not in a fork is against standard upstream open source practice: | ||
|
||
- It creates social strata between contributors who have the permissions to create a branch in the main repo vs contributors who can only create a branch in a user fork. "Official branches in the repo, all development in forks." | ||
- PRs for both the `main` (production-quality) and `research` (rapid and experimental) will live in the same repository. This might cause confusion for PR review workflows - it may not be clear to the PR reviewer which branch the PR is destined for clearly depending on the tooling being used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a valid concern that can be reduced by applying labels to PRs once they are pushed to research
.
|
||
Longer-term, however, the researchers have expressed concern that having this `research` repository in a user's fork makes it more difficult for would-be research-oriented contributors to discover. They would like to have a `research` branch (akin to the common practice of having a `devel` branch) on the main `instructlab` repositories such that their work is discoverable in addition to not negatively impacting the corresponding production-quality `main` branch. | ||
|
||
### Concern about a specialized branch in main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What concerns me most about this proposal is that while researchers work on the research
branch, main
will continue to evolve. This will lead to constant rebasing and potential breakages from main
into research
. We need to keep research
in sync with main
somehow, but the constant back and forth will cause a lot of pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we have a lot of pain the researchers are taking on with continuous code reviews causing friction and delay. The status quo isn't ok. Any ideas on alternatives to address? Noting that researchers' role is not to produce production-ready code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can pair up researchers with software engs so they can embrace software eng practice a little more? I don't think both practices (research and soft eng) are mutually exclusive. Additionally, pushing small changes instead of a large amount of code helps get faster reviews and merges. Rapid iterations do not mean low-quality code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sdg pr #72 is an example of where this isn't happening as advertised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s not a fair response, @mairin. That review was given priority attention and good faith responses trying to find the best solution quickly. By the end of the day, I took on that role of helping get it shape for main by filing two new PRs incorporating the important fix in one, and the desired feature in another incorporating review feedback and adding automated testing. This all happened within a day. What more do you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feedback received regarding the cited PR is valid as well as what you are saying is valid. The problem is we have two different ways of working that we need to accomodate, and we need to find an aproach that accomodates all. Thus I filed this devdoc in an attempt to follow the process for working out such processes, so it could be discussed. I do think my response was fair. It was not made in bad faith. More, to point out, there are two proposed approaches - the research approach and the SWE approach. Neither taken as typically practiced is going to address the concerns of the other. So I am trying to counter the notion that following the standard SWE PR review process on small PRs is not going to address the concerns of the research engineers and I am citing a specific PR for more information. It was not meant to be unfair or a point of criticism. It was meant to point out the problem that needs to be solved.
Nope. Open to suggestions! |
I’d like to see it weighed against keeping all active development not ready for main in repo forks. We talked about this a bunch just today so surprised to see this answer? to be fair I haven’t had a chance to read the docs itself yet given the holiday |
yes, that is the temporary solution marked in the document, which i thought we were aligned on |
|
||
### Proposed long-term solution | ||
|
||
Longer-term, however, the researchers have expressed concern that having this `research` repository in a user's fork makes it more difficult for would-be research-oriented contributors to discover. They would like to have a `research` branch (akin to the common practice of having a `devel` branch) on the main `instructlab` repositories such that their work is discoverable in addition to not negatively impacting the corresponding production-quality `main` branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please share one or more projects that use a devel
branch besides their main
branch? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://nvie.com/posts/a-successful-git-branching-model/
any project that uses the gitflow branching strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we need an instructlab-research
GitHub organization where research forks of the main instructlab
repos are kept. This would be a central place for people to discover ongoing InstructLab research efforts and using forks there will facilitate making pull requests back to the main repos when ready to drive the research work back into production code. This would also allow for many forks of a given repo (e.g instructlab-research/sdg-idea1
, instructlab-research/sdg-idea2
), where multiple disjoint research efforts do not step on each other.
Having "research" branches in the main repos does not make them discoverable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjhargrave I actually really like this idea! what are the main downsides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only downside I can think of is having to manage another GH organization. But that is also a feature. We probably need to have different maintainers for production work than for research work. Using branches in a single repo makes things complex (probably requires CODEOWNERS which vary by branch which is a PITA). With a separate GH org, you can manage the teams and their application to research repos in a more straightforward and common manner perhaps with more relaxed rules for who can be a committer in research repos than you would want for production repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a separate branch (e.g. research
) is doable and some organizations like Helm did it when developing the v3 release. They used a dev-v3
branch and developed it in parallel with the main branch which was v2
. Once v3 was ready to be released, the main moved to dev-v2
and v3 became main
. During development of v3, any fixes or features that were required were pulled in the dev-v3
branch. Labeling was used for PRs and issues because they are common for both branches. In this case, the research
branch should be kept in sync with main
so that they don't diverge.
This is a good alternative @bjhargrave from using research
branch. Its more self contained and mirrors usual best GH practice of users forking a repo. In this case also, the fork should be kept in sync with thr main repo so that they don't diverge. There is however an overhead in maintaining 2 different organizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mairin for pushing this proposal.
Before we go down the route of implementing a solution for working around this issue - can we discuss if we can bridge the gap of researches being able to make contributions to the main repositories regularly? Can we list the current blockages to see if we can find a way to make this work? My reasoning is that separate development branches and separate GH organizations are expensive to maintain and manage, and should be a last resort in my opinion.
## Background | ||
|
||
Research engineers contributing to InstructLab have expressed the need for the ability to make rapid, experimental changes in a space where they can collaborate and move quickly without working through robust, production-level code review. | ||
Software engineers contributing to InstructLab have expressed concern about allowing code into the `main` branch that does not meet the production-level quality standards enforced on main. We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If thats the case then probably should remove the sentence: "We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment.".
|
||
### Proposed long-term solution | ||
|
||
Longer-term, however, the researchers have expressed concern that having this `research` repository in a user's fork makes it more difficult for would-be research-oriented contributors to discover. They would like to have a `research` branch (akin to the common practice of having a `devel` branch) on the main `instructlab` repositories such that their work is discoverable in addition to not negatively impacting the corresponding production-quality `main` branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a separate branch (e.g. research
) is doable and some organizations like Helm did it when developing the v3 release. They used a dev-v3
branch and developed it in parallel with the main branch which was v2
. Once v3 was ready to be released, the main moved to dev-v2
and v3 became main
. During development of v3, any fixes or features that were required were pulled in the dev-v3
branch. Labeling was used for PRs and issues because they are common for both branches. In this case, the research
branch should be kept in sync with main
so that they don't diverge.
This is a good alternative @bjhargrave from using research
branch. Its more self contained and mirrors usual best GH practice of users forking a repo. In this case also, the fork should be kept in sync with thr main repo so that they don't diverge. There is however an overhead in maintaining 2 different organizations.
@mairin what is going on with this? |
Research engineers contributing to InstructLab have expressed the need for the ability to make rapid, experimental changes in a space where they can collaborate and move quickly without working through robust, production-level code review.
Software engineers contributing to InstructLab have expressed concern about allowing code into the
main
branch that does not meet the production-level quality standards enforced on main. We need to minimize the time from code merged into main to shipping in a release, and automating confidence in that shipment.One of the more unique considerations InstructLab has as a project in contrast to other open source projects is the tight collaboration between research engineers and software engineers. There are cultural differences and different best practices in each discipline, and in order to enable this partnership we may need to take some new or less standard approaches to open source project management.
This devdoc proposes the creation of a policy in acknowledgement of the different ways of working inherent to the research engineer role in contrast to the production-level software engineer role. As a project, InstructLab has a goal of enabling upstream, open source collaboration between AI researchers, and this is one of the ways we can help achieve that goal.