-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6 scheduling observer extraction #38
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
fdc0a6d
extracted resource watching logic into a separate class; implemented …
vlerkin 768757b
finish the observer logic extraction; merge changes from main; add re…
vlerkin 8fec9a7
add number of retry attempts and exponential backoff time to the reco…
vlerkin 393a0d8
add logic to reset number of reconnect attempts and backoff time when…
vlerkin 515642f
added a CONFIG.md file with detailed explanations about parameters us…
vlerkin 8fdcb60
move section about config file from README.md to CONFIG.md; add a lin…
vlerkin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
## About | ||
This file provides you with the detailed description of parameters listed in the config file, and explaining why they are used | ||
and when you are expected to provide or change them. | ||
|
||
## Configuration file | ||
|
||
* `http_port` - defaults to `6800` ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#http-port)) | ||
* `bind_address` - defaults to `127.0.0.1` ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#bind-address)) | ||
* `max_proc` - _(implementation pending)_, if unset or `0` it will use the number of nodes in the cluster, defaults to `0` ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#max-proc)) | ||
* `repository` - Python class for accessing the image repository, defaults to `scrapyd_k8s.repository.Remote` | ||
* `launcher` - Python class for managing jobs on the cluster, defaults to `scrapyd_k8s.launcher.K8s` | ||
* `username` - Set this and `password` to enable basic authentication ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#username)) | ||
* `password` - Set this and `username` to enable basic authentication ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#password)) | ||
|
||
The Docker and Kubernetes launchers have their own additional options. | ||
|
||
## [scrapyd] section, reconnection_attempts, backoff_time, backoff_coefficient | ||
|
||
### Context | ||
The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the | ||
number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you | ||
choose to use them. | ||
|
||
The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the | ||
nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate | ||
long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived | ||
connection to the Kubernetes API. To achieve this, three parameters were introduced: `reconnection_attempts`, | ||
`backoff_time` and `backoff_coefficient`. | ||
|
||
### What are these parameters about? | ||
- `reconnection_attempts` - defines how many consecutive attempts will be made to reconnect if the connection fails; | ||
- `backoff_time` and `backoff_coefficient` - are used to gradually slow down each subsequent attempt to establish a | ||
connection with the Kubernetes API, preventing the API from becoming overloaded with requests. The `backoff_time` increases | ||
exponentially and is calculated as `backoff_time *= self.backoff_coefficient`. | ||
|
||
### When do I need to change it in the config file? | ||
Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network | ||
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1 @@ | ||
import logging | ||
from scrapyd_k8s.joblogs.log_handler_k8s import KubernetesJobLogHandler | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
def joblogs_init(config): | ||
""" | ||
Initializes job logs handling by starting the Kubernetes job log handler. | ||
|
||
Parameters | ||
---------- | ||
config : Config | ||
Configuration object containing settings for job logs and storage. | ||
|
||
Returns | ||
------- | ||
None | ||
""" | ||
joblogs_config = config.joblogs() | ||
if joblogs_config and joblogs_config.get('storage_provider') is not None: | ||
log_handler = KubernetesJobLogHandler(config) | ||
log_handler.start() | ||
logger.info("Job logs handler started.") | ||
else: | ||
logger.warning("No storage provider configured; job logs will not be uploaded.") | ||
from scrapyd_k8s.joblogs.log_handler_k8s import KubernetesJobLogHandler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
import threading | ||
import logging | ||
import time | ||
from kubernetes import client, watch | ||
from typing import Callable, List | ||
import urllib3 | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
class ResourceWatcher: | ||
""" | ||
Watches Kubernetes pod events and notifies subscribers about relevant events. | ||
|
||
Attributes | ||
---------- | ||
namespace : str | ||
Kubernetes namespace to watch pods in. | ||
subscribers : List[Callable] | ||
List of subscriber callback functions to notify on events. | ||
""" | ||
|
||
def __init__(self, namespace, config): | ||
""" | ||
Initializes the ResourceWatcher. | ||
|
||
Parameters | ||
---------- | ||
namespace : str | ||
Kubernetes namespace to watch pods in. | ||
""" | ||
self.namespace = namespace | ||
self.reconnection_attempts = int(config.scrapyd().get('reconnection_attempts', 5)) | ||
self.backoff_time = int(config.scrapyd().get('backoff_time', 5)) | ||
self.backoff_coefficient = int(config.scrapyd().get('backoff_coefficient', 2)) | ||
self.subscribers: List[Callable] = [] | ||
self._stop_event = threading.Event() | ||
self.watcher_thread = threading.Thread(target=self.watch_pods, daemon=True) | ||
self.watcher_thread.start() | ||
logger.info(f"ResourceWatcher thread started for namespace '{self.namespace}'.") | ||
|
||
def subscribe(self, callback: Callable): | ||
""" | ||
Adds a subscriber callback to be notified on events. | ||
|
||
Parameters | ||
---------- | ||
callback : Callable | ||
A function to call when an event is received. | ||
""" | ||
if callback not in self.subscribers: | ||
self.subscribers.append(callback) | ||
logger.debug(f"Subscriber {callback.__name__} added.") | ||
|
||
def unsubscribe(self, callback: Callable): | ||
""" | ||
Removes a subscriber callback. | ||
|
||
Parameters | ||
---------- | ||
callback : Callable | ||
The subscriber function to remove. | ||
""" | ||
if callback in self.subscribers: | ||
self.subscribers.remove(callback) | ||
logger.debug(f"Subscriber {callback.__name__} removed.") | ||
|
||
def notify_subscribers(self, event: dict): | ||
""" | ||
Notifies all subscribers about an event. | ||
|
||
Parameters | ||
---------- | ||
event : dict | ||
The Kubernetes event data. | ||
""" | ||
for subscriber in self.subscribers: | ||
try: | ||
subscriber(event) | ||
except Exception as e: | ||
logger.exception(f"Error notifying subscriber {subscriber.__name__}: {e}") | ||
|
||
def watch_pods(self): | ||
""" | ||
Watches Kubernetes pod events and notifies subscribers. | ||
Runs in a separate thread. | ||
""" | ||
v1 = client.CoreV1Api() | ||
w = watch.Watch() | ||
resource_version = None | ||
|
||
logger.info(f"Started watching pods in namespace '{self.namespace}'.") | ||
backoff_time = self.backoff_time | ||
reconnection_attempts = self.reconnection_attempts | ||
while not self._stop_event.is_set() and reconnection_attempts > 0: | ||
try: | ||
kwargs = { | ||
'namespace': self.namespace, | ||
'timeout_seconds': 0, | ||
} | ||
if resource_version: | ||
kwargs['resource_version'] = resource_version | ||
first_event = True | ||
for event in w.stream(v1.list_namespaced_pod, **kwargs): | ||
if first_event: | ||
# Reset reconnection attempts and backoff time upon successful reconnection | ||
reconnection_attempts = self.reconnection_attempts | ||
backoff_time = self.backoff_time | ||
first_event = False # Ensure this only happens once per connection | ||
pod_name = event['object'].metadata.name | ||
resource_version = event['object'].metadata.resource_version | ||
event_type = event['type'] | ||
logger.debug(f"Received event: {event_type} for pod: {pod_name}") | ||
self.notify_subscribers(event) | ||
except (urllib3.exceptions.ProtocolError, | ||
urllib3.exceptions.ReadTimeoutError, | ||
urllib3.exceptions.ConnectionError) as e: | ||
reconnection_attempts -= 1 | ||
logger.exception(f"Encountered network error: {e}") | ||
logger.info(f"Retrying to watch pods after {backoff_time} seconds...") | ||
time.sleep(backoff_time) | ||
backoff_time *= self.backoff_coefficient | ||
except client.ApiException as e: | ||
# Resource version is too old and cannot be accessed anymore | ||
if e.status == 410: | ||
logger.error("Received 410 Gone error, resetting resource_version and restarting watch.") | ||
resource_version = None | ||
continue | ||
else: | ||
reconnection_attempts -= 1 | ||
logger.exception(f"Encountered ApiException: {e}") | ||
logger.info(f"Retrying to watch pods after {backoff_time} seconds...") | ||
time.sleep(backoff_time) | ||
backoff_time *= self.backoff_coefficient | ||
except StopIteration: | ||
logger.info("Watch stream ended, restarting watch.") | ||
continue | ||
except Exception as e: | ||
reconnection_attempts -= 1 | ||
logger.exception(f"Watcher encountered exception: {e}") | ||
logger.info(f"Retrying to watch pods after {backoff_time} seconds...") | ||
time.sleep(backoff_time) | ||
backoff_time *= self.backoff_coefficient | ||
|
||
|
||
def stop(self): | ||
""" | ||
Stops the watcher thread gracefully. | ||
""" | ||
self._stop_event.set() | ||
self.watcher_thread.join() | ||
logger.info(f"ResourceWatcher thread stopped for namespace '{self.namespace}'.") |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is
max_proc
here 2, and inscrapyd_k8s.sample-k8s.conf
10?