-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-4479: Eagerly Init/Load FileSystem In Tez Task Containers #274
base: master
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
@abstractdog Could you please review the changes? |
* String value. Comma seperated list of FileSystem paths which needs to be eagerly initialized. | ||
* For example s3://bucket/,file://,hdfs://localhost:8020/ | ||
*/ | ||
@ConfigurationScope(Scope.AM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Scope is VERTEX as we're doing this in TezChild
- constant name should contain TEZ_ prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
public void run() { | ||
try { | ||
new Path(path).getFileSystem(conf); | ||
LOG.info("Eagerly initiated FileSystem at path {}", path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to measure the time spent with initialization and print it to logs too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
if (eagerInitFsPool == null && !eagerInitPaths.isEmpty()) { | ||
eagerInitFsPool = Executors.newCachedThreadPool(new ThreadFactoryBuilder() | ||
.setDaemon(true) | ||
.setPriority(Thread.MAX_PRIORITY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the code, I can see we tend not to set priority in thread pools...I guess we can remove this to simplify this code further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
@Override | ||
public void run() { | ||
try { | ||
new Path(path).getFileSystem(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this filesystem be closed? does it hold any resources when it's open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be gc'ed eventually? I can see any instances in the code were we are not explicitly closing the filesystem object. For example : https://github.com/apache/tez/blob/master/tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/DatePartitionedLogger.java#L136
💔 -1 overall
This message was automatically generated. |
@@ -503,6 +506,33 @@ public static TezChild newTezChild(Configuration conf, String host, int port, St | |||
hadoopShim); | |||
} | |||
|
|||
private static void eagerInitFileSystemPaths(Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to measure the time spent on FS init (even for cloudstores) and share the details, before trying out this patch.
Reason: This get inited in TezChild, but for running container, it closes the FileSystem explicitly via "FileSystem.closeAllForUGI(childUGI);". Refer "public ContainerExecutionResult run()" method.
Even if this gets early inited, it will not have major impact in container reuse scenario. It will be good to measure and find out the timing spent in FS init.
.build()); | ||
} | ||
for (String path : eagerInitPaths) { | ||
eagerInitFsPool.execute(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before rushing to create lots of fs instances in parallel, look at HADOOP-17313 and why we actually added semaphores to stop apps like tez creating too many at the same time. this code may cause overload problems, or the fs semaphore will hold you back for safety.
best to look at why its taking so long; if s3a bucket existence checks aren't involved, then it'll be whatever auth mechanism is plugged in. same for abfs
Initing/Loading FileSystem such as S3 can take ~10s - ~20s when called for the first time and the time taken for subsequent calls are negligable. If we can load the FileSystem much before it is used can help us to save some time. It can be especially useful in case of pre-warm Tez containers where the Tez task containers comes up when the Application Master (AM) is launched and not on-demand which is the default behavior. It can be also useful in cases where the Mapper tasks spends considerable time consuming the upstream shuffle data and then heads to process some FileSystem operations, in all such cases we have few FileSystem load up time.