Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limits dbnav #10

Open
McToel opened this issue Jan 11, 2025 · 12 comments
Open

Rate limits dbnav #10

McToel opened this issue Jan 11, 2025 · 12 comments

Comments

@McToel
Copy link
Contributor

McToel commented Jan 11, 2025

I'm having some issues with rate limits on trips. So far, I cannot tell how often these rate limits occur. According to my logs, I might have made up to 150 req/minute, 3,000 req/hour and 30,000 req/day.

Since then, I added some more caching and I am now waiting for more rate limits to happen for closer inspection.

I have made this issue so that we are able to track how high the rate limits are and maybe stretch them a bit. For example, for int.bahn.de I saw that marudor does some user agent randomization that might be related to rate limits.

@traines-source
Copy link
Member

As you may know, I'm not a huge fan of bombarding APIs, so I think that if there is a rate limit, we should respect it. (The DB API allegedly currently handles about 20 million requests per day in total, so while your 30k are not a considerable portion of that, they are also not completely negligible).
That said, there is the randomizeUserAgent flag in the profile config that you could try to turn on. I don't really think that this will help for dbnav though, since DB Navigator always sends the same user agent as well. Maybe they do it based on the X-Correlation-ID or some other header. This one, as can be seen in the dumps, changes with every request (and is probably used for some kind of tracking).

@McToel
Copy link
Contributor Author

McToel commented Jan 11, 2025

I've now investigated it a little more, and I have created a plot from db-rest's logs with 5 minute bins. It seems like sending 100 req/5min is totally fine. Maybe the limit is around 150 req/5min.

image

Looking into req/min suggests there might be a limit around 60req/min.

image

For my use case, this means I cannot have more than round about 6 requests to my website per minute.

@derhuerst
Copy link
Member

I have made this issue so that we are able to track how high the rate limits are and maybe stretch them a bit. For example, for int.bahn.de I saw that marudor does some user agent randomization that might be related to rate limits.

I'm probably confusing int.bahn.de and app.vendo.noncd.db.de, but are you referring to this?

@traines-source
Copy link
Member

traines-source commented Jan 11, 2025

Ah interesting, maybe we should randomly generate UUIDs for X-Correlation-ID then. (It would be really bad though if they only base it on that, because in public-transport-enabler I've also just set it to "null"). @McToel have you already found out if from a different IP you can still access it when you're blocked?

(And int.bahn.de is what bahn.de and the db profile uses (for journeys), while app.vendo.noncd.db.de is what DB Navigator and the dbnav profile uses.)

And btw, 60 requests/minute might explain some errors that I might have encountered with https://tespace.traines.eu, too ^^ because there I also have the same problem that one user potentially generates dozens of requests.

@McToel
Copy link
Contributor Author

McToel commented Jan 11, 2025

I did not check different IPs. As the issue is only very temporarily, I cannot just try the same request with another IP. I have more logs now and looked a little closer at how the rate limits are occurring, and it seems like I always have a traffic spike before the rate limit kicks in, which would at least suggest that the rate limit is triggered by my own traffic and not from other people's traffic.

However, the rate limit might still be related to the correlation ID, as I think there are not many others are extensively using this API without setting a correlation ID.

I would guess the rate limit would need to be higher than 60req/min/IP because I think a full ICE train with 600 DB-Nav users might exceed that.

@traines-source if you want to share your logs of your db-rest, I could investigate whether our rate limits are in sync.

@derhuerst I did not confuse it, I thought he might do something similar for vendo, but I did not check, so thanks for the hint.

@traines-source
Copy link
Member

However, the rate limit might still be related to the correlation ID, as I think there are not many others are extensively using this API without setting a correlation ID.

Yes that's why I was alarmed (if it's only based on the correlation ID) and thought that maybe we should change the correlation ID in public-transport-enabler before @schildbach pushes Oeffi to a couple million users :D

I would guess the rate limit would need to be higher than 60req/min/IP because I think a full ICE train with 600 DB-Nav users might exceed that.

Yes IP based would be difficult, but I would hope it's a combination of IP and correlation ID or sth else.

if you want to share your logs of your db-rest

This occurred weeks ago so probably not worth the effort.

@traines-source
Copy link
Member

I have now published a release which among other things uses random correlation ids. If marudor does it, it must be good :)

@schildbach
Copy link

Does anyone have an idea on why the "correlation ID" seems to consist of two UUIDs concatenated?

@traines-source
Copy link
Member

Not sure, I guess different tracking ids. As can be seen from the dumps, the second one changes less frequently, while the first one might be something like a session id. But I guess we can't do more than generating them randomly anyways, just as marudor does and @derhuerst found above.
And we currently don't know whether that changes anything about the rate limits, it's just a hunch (unless @McToel has deployed the new db-vendo-client version and not faced any rate limits since :)

@McToel
Copy link
Contributor Author

McToel commented Jan 12, 2025

Well, sadly I cannot tell whether it made a big difference, because I had less traffic today. I had fewer issues with rate limits, but that might just be due to less traffic.

Request and error count with new vendo client (per minute):
image

There are definitely still rate limits, even with the change.

@derhuerst
Copy link
Member

We could set up a VPS with a static IP and test this so that we have more reliable data.

@envake
Copy link
Member

envake commented Jan 24, 2025

Hey, just tested it on a cloud server and I got rate-limited whenever I used more than 2 in 1000 as throttle setting. 😦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants