-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle automatic caching for RETURNN configs #349
Comments
But keep in mind, that the cache manager is an i6 specific thing and we should try to hold our recipes generic enough s.t. they are also applicable on other clusters (ITC, Paderborn, AppTek) |
Yes sure! |
Related: #310 |
I'm not sure whether there is a good and generic way to do this automatically. Also, e.g. in #310, for Instead of using
Or you can move this logic to the |
I will close this for now, we have the caching in RETURNN itself for heavy data like hdfs or ogg-zips, and there are sufficient options in the serialization helpers to wrap paths with caching. |
So far we relied on the RETURNN internal cache manager access that is implemented e.g. for
HDFDataset
orOggZipDataset
. Now when training LMs, I added a caching function manually to the config and added thecf
call directly viaCodeWrapper
andDelayedFormat
.While this was the fasted approach to getting the training to work, I do not really like this approach. My preferred approach would be that the
ReturnnConfig
itself can handle this, meaning that it will write thedef cf
definition and update the paths accordingly without any potential hash influence and completely independent of the setup pipeline (legacy, returnn_common, etc...)The question is if we want to rely on the internal
cached
marking, or find another way. We could e.g. simply force this for all Paths we find in the config. I am open to suggestions.The text was updated successfully, but these errors were encountered: