Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Raptor transfer cache in parallel #6326

Open
wants to merge 7 commits into
base: dev-2.x
Choose a base branch
from

Conversation

miklcct
Copy link
Contributor

@miklcct miklcct commented Dec 9, 2024

Summary

This makes the Raptor cache generating process run in parallel.

Our GB-wide deployment pre-caches 4 configurations on startup. Before applying this fix, it takes 16 minutes to cache 4 configurations for the whole GB on a 16-core machine:

Dec 09 17:54:06 lemon java[1497422]: 17:54:06.305 INFO [main]  (ConstructApplication.java:242) Creating initial raptor transfer cache progress tracking started.
Dec 09 17:58:31 lemon java[1497422]: 17:58:31.315 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, walk: WalkPreferences{reluctance: 1.68, boardCost: $300}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}}
Dec 09 17:58:31 lemon java[1497422]: 17:58:31.323 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 1 of 4 (25%)
Dec 09 18:02:42 lemon java[1497422]: 18:02:42.391 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, wheelchair, walk: WalkPreferences{reluctance: 1.68, boardCost: $300}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}, wheelchairPreferences: WheelchairPreferences{trip: AccessibilityPreferences{}, stop: AccessibilityPreferences{}, slopeExceededReluctance: 50.0, stairsReluctance: 25.0}}
Dec 09 18:02:42 lemon java[1497422]: 18:02:42.394 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 2 of 4 (50%)
Dec 09 18:06:41 lemon java[1497422]: 18:06:41.853 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, walk: WalkPreferences{reluctance: 1.0, boardCost: $0}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}}
Dec 09 18:06:41 lemon java[1497422]: 18:06:41.855 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 3 of 4 (75%)
Dec 09 18:10:50 lemon java[1497422]: 18:10:50.696 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, wheelchair, walk: WalkPreferences{reluctance: 1.0, boardCost: $0}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}, wheelchairPreferences: WheelchairPreferences{trip: AccessibilityPreferences{}, stop: AccessibilityPreferences{}, slopeExceededReluctance: 50.0, stairsReluctance: 25.0}}
Dec 09 18:10:50 lemon java[1497422]: 18:10:50.698 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 4 of 4 (100%)
Dec 09 18:10:50 lemon java[1497422]: 18:10:50.698 INFO [main]  (ConstructApplication.java:251) Creating initial raptor transfer cache progress tracking complete. 4 done in 16m44s (0 per second).

After applying this patch, it only takes 5 minutes:

Dec 09 18:46:09 lemon java[1500378]: 18:46:09.979 INFO [main]  (ConstructApplication.java:242) Creating initial raptor transfer cache progress tracking started.
Dec 09 18:47:31 lemon java[1500378]: 18:47:31.036 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, walk: WalkPreferences{reluctance: 1.68, boardCost: $300}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}}
Dec 09 18:47:31 lemon java[1500378]: 18:47:31.044 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 1 of 4 (25%)
Dec 09 18:48:46 lemon java[1500378]: 18:48:46.023 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, wheelchair, walk: WalkPreferences{reluctance: 1.68, boardCost: $300}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}, wheelchairPreferences: WheelchairPreferences{trip: AccessibilityPreferences{}, stop: AccessibilityPreferences{}, slopeExceededReluctance: 50.0, stairsReluctance: 25.0}}
Dec 09 18:48:46 lemon java[1500378]: 18:48:46.026 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 2 of 4 (50%)
Dec 09 18:50:02 lemon java[1500378]: 18:50:02.286 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, walk: WalkPreferences{reluctance: 1.0, boardCost: $0}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}}
Dec 09 18:50:02 lemon java[1500378]: 18:50:02.287 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 3 of 4 (75%)
Dec 09 18:51:18 lemon java[1500378]: 18:51:18.349 INFO [main]  (RaptorRequestTransferCache.java:44) Initializing cache with request: StreetRelevantOptions{transferMode: WALK, wheelchair, walk: WalkPreferences{reluctance: 1.0, boardCost: $0}, street: StreetPreferences{drivingDirection: LEFT, accessEgress: AccessEgressPreferences{maxDuration: DurationForStreetMode{default:2h}}}, wheelchairPreferences: WheelchairPreferences{trip: AccessibilityPreferences{}, stop: AccessibilityPreferences{}, slopeExceededReluctance: 50.0, stairsReluctance: 25.0}}
Dec 09 18:51:18 lemon java[1500378]: 18:51:18.352 INFO [main]  (ConstructApplication.java:248) Creating initial raptor transfer cache progress: 4 of 4 (100%)
Dec 09 18:51:18 lemon java[1500378]: 18:51:18.353 INFO [main]  (ConstructApplication.java:251) Creating initial raptor transfer cache progress tracking complete. 4 done in 5m8s (0 per second).

Also, the journey planning response time for a new configuration has been reduced correspondingly from more than 4 minutes to around 1.5 minutes.

Issue

#6312

Unit tests

None. This is a performance improvement only with no externally visible change.

Documentation

N/A

Changelog

Bumping the serialization version id

Not needed

@miklcct miklcct requested a review from a team as a code owner December 9, 2024 19:18
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 69.85%. Comparing base (5f9b448) to head (2ae8816).
Report is 27 commits behind head on dev-2.x.

Files with missing lines Patch % Lines
...thm/raptoradapter/transit/RaptorTransferIndex.java 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             dev-2.x    #6326   +/-   ##
==========================================
  Coverage      69.85%   69.85%           
- Complexity     17921    17931   +10     
==========================================
  Files           2035     2035           
  Lines          76495    76520   +25     
  Branches        7824     7826    +2     
==========================================
+ Hits           53434    53454   +20     
- Misses         20324    20326    +2     
- Partials        2737     2740    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@t2gran
Copy link
Member

t2gran commented Dec 11, 2024

I think this is going to reduce the throughput slightly. The issue is that one request that require new transfers to be generated will steel processor time from other requests. I am not sure how the affect memory fetches, but It might have a negative effect on running trip searches - at least if a planning request is swapped out in favour of calculating transfers.

The threads also loose log-trace-parameters-propagation and graceful timeout handling.

The parrallel procecing at least need to be feature enabled using OTPFeature.ParallelRouting.isOn().

@optionsome
Copy link
Member

There is also a possibility that these are only computed in parallel before start up but not after server is running. I don't know whether this code is used for both cases or not.

@miklcct
Copy link
Contributor Author

miklcct commented Dec 12, 2024

There is also a possibility that these are only computed in parallel before start up but not after server is running. I don't know whether this code is used for both cases or not.

I specifically need it to compute in parallel in order to make our response time down from 4 minute to 1 minute.

@optionsome optionsome self-requested a review December 12, 2024 15:00
@optionsome
Copy link
Member

Only check the feature flag during run time, not during start-up.

@optionsome optionsome added Optimization The feature is to improve performance. Digitransit Test Feature is under testing in Digitransit environment(s) labels Dec 12, 2024
@optionsome optionsome added this to the 2.7 (next release) milestone Dec 12, 2024
@optionsome
Copy link
Member

optionsome commented Dec 13, 2024

I tested this in our dev environment (without parallel routing, so just for start-up). I did not witness the transfer cache processing being faster, it was maybe even slightly slowed on the machines we use. I benchmarked on D4ads v5 machines (4 vcpu) in Azure.

@miklcct
Copy link
Contributor Author

miklcct commented Dec 13, 2024

How many transfers do you have, and what's the load of your machine?

We are running on our own hardware (we have just placed a new physical server into the data centre a few weeks ago).

@optionsome
Copy link
Member

optionsome commented Dec 13, 2024

How many transfers do you have, and what's the load of your machine?

I tried it with few different instances with slightly different configurations but 3-7 cache requests but I'm not sure about the total number of transfers but I tried with a countrywide deployment, for example. There shouldn't be much other load on the machines but I tested these in kubernetes environments so there are some kube services running on the machines and some of the deployments had some real-time updaters configured also.

I wouldn't mind if someone else could also do some benchmarking if these changes have a positive effect or not.

@optionsome optionsome removed the Digitransit Test Feature is under testing in Digitransit environment(s) label Dec 16, 2024
# Conflicts:
#	application/src/main/java/org/opentripplanner/routing/algorithm/raptoradapter/transit/RaptorTransferIndex.java
@optionsome optionsome requested review from leonardehrenfried and removed request for t2gran December 19, 2024 14:42
@leonardehrenfried
Copy link
Member

I deployed this to one of my environments and I did see a slight speed up. Definitely not as much as @miklcct is seeing but still a bit faster.

@t2gran
Copy link
Member

t2gran commented Jan 3, 2025

OTP is designed to run one request in one thread - this optimize throughput, but have the disadvantage that long lasting request use more time. When testing this it is important to have enough requests to "activate" all threads. This can be done using a test client which run N threads and immediately send a new request when getting a result back. Then run the same test with N = 1, .., num-of-cores. Another factor is the number of requests that trigger a new transfer generation for the cashe. It is hard to reason about the performance numbers posted above without knowing how the tests are performed. One of the key things we would like to know is the effect a "long lasting request" has on other request. Not, just avg. response times.

I am ok with this after the change where we can turn off this feature in REQUEST_SCOPE.

@optionsome
Copy link
Member

I deployed this to one of my environments and I did see a slight speed up. Definitely not as much as @miklcct is seeing but still a bit faster.

Ok, I will retest this in our environment at some point to re-validate how it affects the performance in our setup. If we can still observe slightly negative effect, perhaps the start up parallelism should be put behind another feature flag.

@miklcct
Copy link
Contributor Author

miklcct commented Jan 3, 2025

I deployed this to one of my environments and I did see a slight speed up. Definitely not as much as @miklcct is seeing but still a bit faster.

How many transfers do you have? I have now got 5.7M WALK transfers and 16.1M BIKE transfers in my whole graph on the test server configured with 20 minutes transfers for the whole GB, which is 10.3 GB in size.

@leonardehrenfried
Copy link
Member

I don't know the exact number but it's much much smaller. About 50k stops.

@t2gran
Copy link
Member

t2gran commented Jan 3, 2025

Related to this, at Entur we will roll-back PR #6238 - it degrade the performance al lot.

The real solution to this problem is to optimize the transfer calculations - it should not touch the AStar paths at all. As part of this I would also like to see what would happen if we calculated the transfers in every request, using a lazy approach. This would possibly have a good effect on large graphs.

@optionsome
Copy link
Member

optionsome commented Jan 3, 2025

Related to this, at Entur we will roll-back PR #6238 - it degrade the performance al lot.

Why aren't the performance problems visible in the speed tests?

@miklcct
Copy link
Contributor Author

miklcct commented Jan 3, 2025

As part of this I would also like to see what would happen if we calculated the transfers in every request, using a lazy approach. This would possibly have a good effect on large graphs.

I am looking forward to this and this may probably be our long-term solution. If it means a local journey with a new configuration can be done in 15 seconds instead of 2 minutes (even if a long-distance journey still requires 2 minutes) we can more easily explain this to our customers.

@miklcct
Copy link
Contributor Author

miklcct commented Jan 4, 2025

Related to this, at Entur we will roll-back PR #6238 - it degrade the performance al lot.

Why aren't the performance problems visible in the speed tests?

Do the speed test have requests which generate the run time raptor cache on a large transfer graph? It should show immediately.

This PR combined with the roll back of #6238 has finally make reasonable bike transfers on the whole GB realistic, even for a long ride between London Terminals.

@habrahamsson-skanetrafiken
Copy link
Contributor

We decided to

  1. Revert Allow multiple states during transfer edge traversals #6238
  2. Merge Leonards pr adding speed tests
  3. Look at the performance of this PR

@optionsome
Copy link
Member

I retested this now this does indeed seem to cut off 50%+ from the computation time. My previous attempt to test this failed due to #6238 affecting results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Optimization The feature is to improve performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants