Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed changes to default storage adapter #159

Closed
sserrata opened this issue Apr 17, 2020 · 2 comments
Closed

Proposed changes to default storage adapter #159

sserrata opened this issue Apr 17, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@sserrata
Copy link
Member

sserrata commented Apr 17, 2020

Proposal to change default storage adapter

Introduction

The original idea behind storage adapters was to provide a way to adapt the underlying credentials storage layer in order to:

  • Read/write from/to a custom credentials store.
  • Maintain the same internal interface between the Credentials class and the storage adapter, i.e. maintain compatibility.

credentials.json

Since the early pancloud days, the default storage adapter class has been TinyDBStore, which currently reads/writes from/to the ~/.config/pan_cortex_data_lake/credentials.json file. The reasoning behind this was simple: provide a low-friction way for users to persist OAuth2 credentials during their exploration and proof-of-concept phases. With support for credentials profiles, the experience was intended to resemble that of other SDK/libraries, e.g. boto3 et al.

Beyond the exploratory/PoC phases, many developers are simply not interested in using the credentials.json file. Furthermore, there isn't an intuitive way to disable it, which can lead to some issues. For example, some serverless runtimes, e.g. AWS Lambda, implement read-only filesystems. Although the Credentials class and default TinyDBStore provide a way to specify a dbfile path via storage_params or the PAN_CREDENTIALS_DBFILE envar, this approach is neither intuitive or obvious (nor is it documented very well). It also doesn't "disable" the credentials.json file - it merely moves it to a different location (e.g. /tmp). If your production app/integration doesn't require the credentials.json file (and, arguably, it really shouldn't) then it seems counterproductive to not have an easy/obvious way to disable it.

Current Workaround(s)

Today, if one wanted to disable the use of the credentials.json file, you could:

  • Write and implement a custom storage adapter
  • Subclass Credentials, pass another storage class using self.storage and override the inherited class methods (using self.storage for invoking your read/write calls).

Proposals

  • Ship a second storage adapter class with the CDL Python SDK called MemoryStore. As the name implies, MemoryStore would read/write credentials from/to memory or in-memory. It just so happens that tinydb ships with a MemoryStorage class.
  • Make MemoryStore the new default storage adapter.
  • Use cortex.pan.dev to document this change and the full usage around Credentials and storage adapters, including examples and explanation of when developers should subclass vs write a new storage adapter.
  • Continue shipping only the TinyDBStore adapter but provide a way to activate "in-memory mode" using storage_params. For example:
from pan_cortex_data_lake import Credentials

c = Credentials(storage_params={"memory_storage": True})

The latter would be my preferred option as it introduces the fewest lines of code, by far.

Other ideas

Adopt the "credentials providers" approach used in the NodeJS and Java versions of the SDK. This would require the following:

  • Eliminate storage adapters altogether (see second bullet).
  • Simplify the Credentials class and API to become an interface only for accessing credentials. This means also moving the exiting OAuth2 helper methods, e.g. refresh(), fetch_credentials(), etc., to a future, separate provider module/class.

Reasons to avoid this path:

  • Shipping an SDK with no provider, i.e. requiring the provider to be installed and imported separately, introduces additional friction to the exploratory and PoC phases. The python community tends to appreciate a more "batteries included" approach.
  • Provider packages will need to be maintained and versioned separately, complete with testing. Although this might help simplify the CDL Python SDK code base, it will introduce additional maintenance overhead - actually, the maintenance/overhead would just get punted elsewhere.
  • The concept of "providers" could lead to an endless family of provider packages, contributed both internally and by the community. These packages would have to be maintained/updated/supported and documented, which could be difficult to keep up with.
  • The current approach already offers two, arguably cleaner, ways to customize behavior: 1) subclassing and overriding methods and 2) storage adapters. The patterns and 3rd-party storage adapter modules can still be shared throughout the community but users would be responsible for adding and maintaining these patterns/modules in their code base.
  • Shipping two storage adapters with the CDL Python SDK (and the code necessary to support them) occupies a relatively small footprint in the library. IMO, the overhead they introduce is small - small enough to suggest that maintaining them as separate packages might be unnecessary/overkill (and arguably more work/overhead).
  • If we adopt the storage_params approach to activating in-memory mode, then we can maintain backward compatibility and gain the functionality with only a few lines of code.

Similar/related issues

@sserrata sserrata added the enhancement New feature or request label Apr 17, 2020
@sserrata sserrata self-assigned this Apr 17, 2020
@sserrata
Copy link
Member Author

I will be shipping the storage_params suggestion since it would be a non-breaking change to the existing TinyDBStore class.

@sserrata
Copy link
Member Author

Observed differences between credentials.json and MemoryStorage:

  • credentials.json acts as a true, shared credentials store. This means clients are (potentially) able to read and write to this shared store from different runtimes/execution environments.
  • MemoryStorage mode is not a shared credentials store. This means clients are only able to read and write to this store within their respective runtime/execution environment.

Implications:

  • In serverless deployments, which implement distinct execution environments, it will be required to implement a Credentials subclass or custom storage adapter in order to share, reuse or cache credentials across function calls.

This was referenced Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant