[INLONG-7056][Sort]Adjust sort resources according to data scale #10915

PeterZh6 · 2024-08-27T05:16:46Z

Motivation

Currently, the total amount of resources for the Flink Sort Job comes from the configuration file flink-sort-plugin.properties, meaning that all submitted sort jobs will use the same amount of resources. When the data scale is large, the resources may be insufficient, and when the data scale is small, the resources may be wasted. Therefore, dynamically adjusting the number of resources according to the amount of data is a critically needed function.

Modifications

Before submitting a job to Flink with org.apache.inlong.manager.plugin.flink.FlinkService#submitJobBySavepoint, the org.apache.inlong.manager.plugin.flink.FlinkParallelismOptimizer will first query the average data volume from the past hour and adjust the parallelism based on this data volume.
Meanwhile, this function can be swiched on or off and maxmimum message for one core can be configured in flink-sort-plugin.properties

Verifying this change

This change is a trivial rework/code cleanup without any test coverage.
This change is already covered by existing tests, such as:
When creating a stream in Data Ingestion, you can try to make the source data constantly increase and reach a significant amount (approximately more than 2000 per hour). Then, resubmit the job. You should notice that the parallelism of the Flink job corresponding to the stream will be larger than the default value of 1. This change will also be reflected in the manager logs.
[] This change added tests and can be verified as follows:

github-actions

Hello @PeterZh6, thank you for submitting a PR to InLong 💖 We will respond as soon as possible ⏳
This seems to be your first PR 🌠 Please be sure to follow our Contribution Guidelines.
If you have any questions in the meantime, you can also ask us on the InLong Discussions 🔍

[INLONG-7056][Sort]Adjust sort resources according to data scale

011e102

github-actions bot added component/manager component/audit labels Aug 27, 2024

github-actions bot reviewed Aug 27, 2024

View reviewed changes

PeterZh6 closed this Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INLONG-7056][Sort]Adjust sort resources according to data scale #10915

[INLONG-7056][Sort]Adjust sort resources according to data scale #10915

PeterZh6 commented Aug 27, 2024

github-actions bot left a comment

[INLONG-7056][Sort]Adjust sort resources according to data scale #10915

[INLONG-7056][Sort]Adjust sort resources according to data scale #10915

Conversation

PeterZh6 commented Aug 27, 2024

Motivation

Modifications

Verifying this change

github-actions bot left a comment

Choose a reason for hiding this comment