-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add copybara file and the associated workflow #100
Conversation
18399b0
to
a9ca92e
Compare
Lets run this at 6 or 7 am or after works hours in order to not have folks rebase in the middle of the day on the incoming changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after looking over how this is working and also examining the PR / commit that copybara is creating, I am not sure why we would want this over just doing a git pull / push
to sync the history from upstream main
to rocm/jax:main
.
It appears that copybara is creating a new commit from all the file changes from the "origin" branch, which is suboptimal for a lot of reasons, but mainly I don't think its very useful if it is going to be creating a new "history" when it copies code from upstream commits into new commits on our main branch. This would pretty much always result in making it much more difficult for Git rebase to calculate the correct set of commits to move from our internal branches to upstream, as the new copybara commits will show up as "new" from the merge-base. This makes it much harder to port our internal changes to upstream when they are ready, as well as calculating differences from upstream to downstream easily.
Did you try just doing a straight git pull/push using an external script? You already have the credential management stuff for doing these kinds of operations on repos external to the one with Github Actions, so it would be easy to drop a new workflow in a different repo to have it do the sync if you still want to use GH Actions.
workflow_dispatch: | ||
pull_request: | ||
schedule: | ||
- cron: '0 12 * * *' # Runs every day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to run at 6 or 7 am.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason? I have seen them upstreaming stuff anytime of the day
From your comment above, I see two primary issues that need addressing:
Essentially, there will always be some diff with upstream that we need to maintain and manage for one reason or another. Which means that we need to copy code from upstream and apply our code transformations on top of those changes when necessary. Currently, the approach to handle the above issues is to manage branches with diffs from upstream are maintained until upstream merges those changes, in which case they flow down automatically. However, this is error prone and unreliable unless there is some automation in place. Finally, one might argue that why go through a new tool, when all we need is some python wrapped around a The history of our downstream repo is no longer relevant since we always directly merge stuff into upstream. This assumes that you use copybara again to automatically create a PR in upstream, since that is based on a diff, github's history is immaterial. Also, this is how upstream manages their repos as well, with changes flowing from their internal repo to the external one. For us the repos are reversed with changes flowing from upstream to downstream. Please let me know your thoughts. |
What do you envision us adding to that Given that we merge this and don't share a commit history with upstream, what're the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than that some of @charleshofer questions are definitely valid.
workflow_dispatch: | ||
pull_request: | ||
schedule: | ||
- cron: '0 12 * * *' # Runs every day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason? I have seen them upstreaming stuff anytime of the day
Features such as
I am hoping we should be able to use copybara to automate that process, such that when we approve a PR on our fork, coybara can cherry pick the diff to upstream and we would not have to manually do those steps. Creating a new PR automatically using copybara is trivial, however, rolling our own tool to do the same would need some work and maintenance on our part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with Jehandad this morning at the standup. This all makes sense to me. I know on Thursday that Matt said that they had this pretty much figured out at IBM and that there's a better way than Copybara, so I'd still be curious to know how they were doing it.
Notes from my discussion with Matt on CI: At IBM, there were several projects that were open source, but where IBM maintained a similar downstream repo to us. IBM would put some changes onto upstream releases, have long-running features that were specific to IBM code, and would push fixes to upstream. This was all doable with regular JAX has some weird dependencies that are tricky to manage. These will pretty much always be ahead of upstream:
Features nice to have:
Solutions for the JAX team:
Going forward:
|
We decided today to not use Copybara and to follow an alternate plan. |
The cron for the workflow fires everyday at
0 12 * * *
to fetch changes from upstream and create a PR in this repo. Which would automatically fire off the CI tests and can be merged manually.