Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal build vs OSS misaligment #122

Open
bhack opened this issue Jul 12, 2022 · 7 comments
Open

Internal build vs OSS misaligment #122

bhack opened this issue Jul 12, 2022 · 7 comments

Comments

@bhack
Copy link
Contributor

bhack commented Jul 12, 2022

We had some c++ flags disaligment between c++ and copybara. See:
tensorflow/tensorflow#56276 (comment)

We had also many problem about failure reproducibility between OSS and internal test without forcing OSS with
this to reproduce internal failures: tensorflow/tensorflow@11dc383

Also we had multiple rollbacks after merge.

/cc @cheshire any other note?

/cc @learning-to-play

@bhack
Copy link
Contributor Author

bhack commented Jul 12, 2022

P.s. NIT we Need to enforce:
tensorflow/tensorflow#56276 (comment)

P.s.s. the linking required many CPU cores at every iteration cause we had >50 targets to link (very resource intensive) for just running a single test after an edit. So we needed to do the linking in parallel with many core and with a speedup of LD Gold (#110).

Also I needed to keep the PR branch freezed without rebasing/merging for months cause a rebase will eventually invalidate the Bazel cache requiring a monster build time again.
This is another problem for PR that are open for weeks or months and you could need to rebase to solve conflicts.

@bhack
Copy link
Contributor Author

bhack commented Jul 12, 2022

We had also a quite tricky issue filtering single test on the development cycle.

You was already notifed at:
tensorflow/tensorflow@0f9af91

@mihaimaruseac mihaimaruseac changed the title Copybara vs OSS misaligment Internal build vs OSS misaligment Jul 13, 2022
@bhack
Copy link
Contributor Author

bhack commented Aug 31, 2022

@mihaimaruseac I don't know if you could add something here for tensorflow/tensorflow#57468 (comment)

Thanks

@cheshire
Copy link

I believe tensorflow/tensorflow@96b26a2 is a huge step in fixing this, fixing most (if not all) such issues.

@bhack
Copy link
Contributor Author

bhack commented Aug 31, 2022

@cheshire Thanks, do you have a full list of the currently enabled warnings as error in copybara related jobs?

@bhack
Copy link
Contributor Author

bhack commented Aug 31, 2022

I believe tensorflow/tensorflow@96b26a2 is a huge step in fixing this, fixing most (if not all) such issues.

We are still suppressing all the warnings:
https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L295-L301

# Suppress all C++ compiler warnings, otherwise build logs become 10s of MBs.
build:android --copt=-w
build:ios --copt=-w
build:linux --host_copt=-w
build:macos --copt=-w
build:windows --copt=/W0
build:windows --host_copt=/W0

With the mentioned commit we are just enabling the specific unused-result but if we check this table that flag is one of the default error we have also with -Werror.

So the point here is to understand what flags we have in the copybara builds as if we are using -Werror there it has > 50 warning types.

@mihaimaruseac
Copy link
Collaborator

@mihaimaruseac I don't know if you could add something here for tensorflow/tensorflow#57468 (comment)

Sadly, I don't think we can do much here. The issue there was that an internal file with internal code also needed to be updated. Using copybara instead of the internal file would have been too cumbersome.

Though, I also think that this type of breakages is small. It should only occur when you are adding new defines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants