Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/nav 91 create raptor cache to prevent building raptor everytime for same date queries #52

Conversation

munterfi
Copy link
Member

@munterfi munterfi commented Jun 5, 2024

Cache raptor instances per day using LRU strategy.

@clukas1 considering your description in the Jira ticket:

  • Store built raptors so that they can be re-used again if a second query for the same date is requested.
  • Ideally add logic, that raptor only get’s rebuilt if the service calendar type is different. I.e. if Wednesday and Thursday schedule is identical, use the same raptor instance for both.

The second point won't work, since there are special calendar date in the GTFS, e.g. Christmas 1. May, ... So we have to cache raptor instances per date not weekday.

But: I think I got a solution to mask trips inside Raptor. This will also eradicate the need for the stop validations (except departure time), since we have all stops from the GTFS in the raptor, but some stops will just have no active trips.

I can prepare everything around it, but would be nice to build this together into the mega loop of the Raptor, maybe Friday evening or at the weekend?

Unfortunately we still have some strange errors when stops are not present in the Raptor:

Screenshot 2024-06-05 at 17 31 07

@munterfi munterfi requested a review from clukas1 June 5, 2024 15:54
@munterfi munterfi self-assigned this Jun 5, 2024
Copy link
Member

@clukas1 clukas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, suggestions to improve tests.

@clukas1
Copy link
Member

clukas1 commented Jun 5, 2024

Cache raptor instances per day using LRU strategy.

@clukas1 considering your description in the Jira ticket:

  • Store built raptors so that they can be re-used again if a second query for the same date is requested.
  • Ideally add logic, that raptor only get’s rebuilt if the service calendar type is different. I.e. if Wednesday and Thursday schedule is identical, use the same raptor instance for both.

The second point won't work, since there are special calendar date in the GTFS, e.g. Christmas 1. May, ... So we have to cache raptor instances per date not weekday.

But: I think I got a solution to mask trips inside Raptor. This will also eradicate the need for the stop validations (except departure time), since we have all stops from the GTFS in the raptor, but some stops will just have no active trips.

I can prepare everything around it, but would be nice to build this together into the mega loop of the Raptor, maybe Friday evening or at the weekend?

Unfortunately we still have some strange errors when stops are not present in the Raptor:

Screenshot 2024-06-05 at 17 31 07

Regarding point two in the Jira Issue, I'm aware not every Thursday etc. is the same. But I think if we combine all active service ids, we can generate a meaningful cache key. Because I expect most of the regular weekdays etc will have the same service ids. Might require adding a function to create this key.

Regarding mask, sure we can huddle up and integrate it into the raptor algorithm. I'm busy on Friday, but should have some time on Saturday.

munterfi added 4 commits June 6, 2024 10:28
- The only compatible version of log4j with our project, seems to be 3.0.0-beta1.
- Versions below 3.0.0 and also 3.0.0-beta2 are ignoring the log4j2-test.properties file, which sets the default log level for tests to INFO.
…date

- Since many days have the same active trip set, this prevents from recomputing the raptor for each date.
- This prevents from recomputing the active trips for a day, that has recently been queried.
- Ensure values are not recomputed when cached.
Copy link
Member

@clukas1 clukas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again some small comments

pom.xml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that only beta works, but good if that works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have somehow a conflict between the logging of spring and the log4j which we pull directly in the pom.xml.
Logging works for testing and production, but in production the formatting is not consistent anymore, so it uses a mix of spring and log4j2.properties:

image

I gues we should open a new ticket in JIRA to fix this. At the moment at least all logs we want are shown, although not in the format we want. For now I can live with that...

@munterfi
Copy link
Member Author

munterfi commented Jun 6, 2024

Regarding point two in the Jira Issue, I'm aware not every Thursday etc. is the same. But I think if we combine all active service ids, we can generate a meaningful cache key. Because I expect most of the regular weekdays etc will have the same service ids. Might require adding a function to create this key.

Yes you are right, in theorie this should work, and it is now implemented in abacfbe. On the integration test data sample this works well, but on the GTFS of Switzerland, almost every day has its own active trip set. So even if we query normal weekdays, most of the time a new raptor will be created.

Here also an example how the equals on Sets works in Java. I was not sure, if the set instance itself matters or not: It does not, just its content is considered in equals:

package ch.naviqore.example;

import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.RequiredArgsConstructor;

import java.util.HashSet;
import java.util.Set;

public class Test {

    @RequiredArgsConstructor
    @Getter
    @EqualsAndHashCode
    static class Instance {
        private final String id;
    }

    public static void main(String[] args) {
        Instance instanceA = new Instance("a");
        Instance instanceA2 = new Instance("a");
        Instance instanceB = new Instance("b");
        Instance instanceC = new Instance("c");

        Set<Instance> set1 = new HashSet<>();
        Set<Instance> set2 = new HashSet<>();

        set1.add(instanceA);
        set1.add(instanceA2);
        set1.add(instanceB);
        set1.add(instanceC);

        set2.add(instanceA);
        set2.add(instanceB);
        set2.add(instanceC);

        System.out.println(set1.equals(set2));
        // true

        set2.add(instanceA);
        System.out.println(set1.equals(set2));
        // true

        set1.remove(instanceA);
        System.out.println(set1.equals(set2));
        // false
    }

}

@munterfi munterfi requested a review from clukas1 June 6, 2024 16:54
@clukas1
Copy link
Member

clukas1 commented Jun 6, 2024

Review already done :) See my comments before you requested the review. Only thing I would consider to reduce the set size, is using sets of calendars instead of trips to identify similar days.

@clukas1
Copy link
Member

clukas1 commented Jun 6, 2024

Review already done :) See my comments before you requested the review. Only thing I would consider to reduce the set size, is using sets of calendars instead of trips to identify similar days.

But sadly, you're right for a large dataset (like switzerland) every day will be a special case.

Copy link
Member

@clukas1 clukas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, have to leave another message

munterfi added 2 commits June 7, 2024 09:47
… instead of active trips

- Needs less memory in the cache and is more efficiently computed.
@munterfi munterfi requested a review from clukas1 June 7, 2024 08:14
@clukas1 clukas1 merged commit 168591c into main Jun 7, 2024
2 checks passed
@clukas1 clukas1 deleted the feature/NAV-91-create-raptor-cache-to-prevent-building-raptor-everytime-for-same-date-queries branch June 7, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants