Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor projectIdSingleton -> hostUrlSingleton #148

Merged
merged 7 commits into from
Oct 31, 2023

Conversation

austin-denoble
Copy link
Contributor

@austin-denoble austin-denoble commented Oct 25, 2023

Problem

In the future, we'll be using describeIndex to retrieve the specific host value for a given Index. Previously, this was handled internally by the client via configuration values and pinging whoami, and caching the results to prevent unneeded requests and latency.

We want to maintain this caching behavior for apiKey -> indexName while simplifying the client configuration and calling describeIndex to retrieve the host.

Solution

The new IndexHostSingleton replaces the existing ProjectIdSingleton. The functionality is very similar between the two singletons at a high level, but the code paths for making network calls are different. I've also gone ahead and pulled out projectId and environment as configuration values as they'll no longer be needed to generate the data plane address.

  • Delete ProjectIdSingleton and add new IndexHostSingleton.
  • Add unit tests for IndexHostSingleton.
  • Remove projectId and environment from PineconeConfiguration and update related areas, tests, documentation bits, etc. Add new PINECONE_CONTROLLER_HOST environment variable.
  • Add .then() callbacks to describeIndex and deleteIndex to both set and delete the host from the cache when needed.

Type of Change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Test Plan

Testing is a bit tricky at the moment, but you should be able to pull the branch down and test using the repl:

npm run repl

await init()
await client.listIndexes()

await client.index('my-index').describeIndexStats()

Make sure unit tests are passing in CI.

Integration tests are a bit borked right now, but there is a follow-up ticket to address those specifically.

@austin-denoble austin-denoble changed the title Refactor projectIdSingleton Refactor projectIdSingleton -> hostUrlSingleton Oct 26, 2023
@austin-denoble austin-denoble marked this pull request as ready for review October 26, 2023 21:26
@@ -14,48 +16,38 @@ jest.mock('../utils', () => {
});
jest.mock('../data/fetch');
jest.mock('../data/upsert');
jest.mock('../data/hostUrlSingleton');

jest.mock('../control', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a lot of time trying to get the mocking for things working and it still feels like a mess. 😭

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this sucked. If you look at my comment on pinecone.ts, I think you could do this in a way that doesn't require any modifications to describeIndex.

@@ -6,7 +6,6 @@ describe('Error handling', () => {
test('calling control plane', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will need to be fixed up.

@@ -4,7 +4,6 @@ describe('Client initialization', () => {
test('can accept a config object', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, just fixing up lint / typescript for now.

export const describeIndex = (api: IndexOperationsApi) => {
export const describeIndex = (
api: IndexOperationsApi,
callback?: (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on a few different means of allowing the describeIndex call to interact with HostUrlSingleton. Initially I wanted to use it within here, but then we didn't have access to the PineconeConfiguration for the cacheKey in set, so went with a callback.

const describeResponse = await describeIndex(indexOperationsApi)(indexName);
const host = describeResponse.status?.host;

if (!host) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to cover this case here because the type indicates it could occur (mostly because of loose typing in the openapi spec, I think), but in practice I have never seen the host value come back missing. I think the url is known immediately, whether or not the index is actually ready to receive data plane calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, primarily we're satisfying flow. Maybe we could opt to update the OpenAPI spec to make this a bit easier for us.

Copy link
Collaborator

@jhamon jhamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over this diff really made me appreciate what a trooper you've been doing so many huge reviews for me lately.

To summarize my feedback:

  • Add an env var to override control plane url
  • Consider extracting a new class of some kind to handle config/initialization for IndexOperationsApi
  • You can simplify a lot of stuff by leaving describeIndex as is and making it the responsibility of the Pinecone class to inform the singleton of the host whenever pinecone.describeIndex() gets called
  • There's an edge case with pinecone.deleteIndex also. Probably things should be evicted from the singleton cache when an index is deleted. Otherwise repeated create/delete/create operations will leave you in a state where the cached host url is invalid.

apiKey: Type.String({ minLength: 1 }),
projectId: Type.Optional(Type.String({ minLength: 1 })),
hostUrl: Type.Optional(Type.String({ minLength: 1 })),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a good idea to have this here, or maybe we need to create a distinction between configuration for the Pinecone class and the Index class. In the past, environment/apiKey/projectId were all global so it wasn't a significant difference and we could use the same config object everywhere, but now we have a need to have index-specific configuration since hostUrl will only apply to a single index. I need to go throught the rest of this PR still before I make up my mind, but it feels to me like maybe the time to setup a dedicated IndexConfiguration that extends/subclasses PineconeConfiguration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and split things apart. We now have:

  • PineconeConfiguration
  • IndexConfiguration (which extends PineconeConfiguration)

Let me know if this doesn't make sense.

src/pinecone.ts Outdated
const { apiKey, environment } = options;
const controllerPath = `https://controller.${environment}.pinecone.io`;
const { apiKey } = options;
const controllerPath = `https://api.pinecone.io`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for backticks anymore since we're no longer interpolating vars.

Suggested change
const controllerPath = `https://api.pinecone.io`;
const controllerPath = 'https://api.pinecone.io';

Also, this is where I was thinking an environment variable like PINECONE_CONTROLLER_HOST could be useful to unblock our integration testing.

src/pinecone.ts Outdated
Comment on lines 143 to 153
/**
* @internal
* Used as a callback by {@link Pinecone.describeIndex} to cache any index host URLs that are returned.
* This bypasses the need to fetch the host on subsequent dataplane calls.
*/
_handleDescribeIndex = (response: IndexDescription, indexName: IndexName) => {
if (response.status?.host) {
HostUrlSingleton._set(this.config, indexName, response.status.host);
}
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an alternative approach would be to invoke HostUrlSingleton from inside the describeIndex method defined further down this file, since that's the only place this._describeIndex gets used.

  describeIndex(indexName: IndexName) {
    const description = this._describeIndex(indexName);
    HostUrlSingleton._set(this.config, indexName, response.status.host);
    return description;
  }

If you do it this way, you shouldn't need any changes callback changes in src/data/describeIndex.ts and can simplify some of the testing gymnastics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, not really sure why I decided to go with a callback at this point. This would have been much cleaner. 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember the reason: we're returning promises and I went with passing an explicit callback rather than using .then() on the return. Using the promise itself and keeping things isolated to pinecone.ts.

@@ -14,48 +16,38 @@ jest.mock('../utils', () => {
});
jest.mock('../data/fetch');
jest.mock('../data/upsert');
jest.mock('../data/hostUrlSingleton');

jest.mock('../control', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this sucked. If you look at my comment on pinecone.ts, I think you could do this in a way that doesn't require any modifications to describeIndex.

Comment on lines 19 to 34
const _buildIndexOperationsApi = (config: PineconeConfiguration) => {
const { apiKey } = config;
const controllerPath = `https://api.pinecone.io`;
const apiConfig: IndexOperationsApiConfigurationParameters = {
basePath: controllerPath,
apiKey,
queryParamsStringify,
headers: {
'User-Agent': buildUserAgent(false),
},
fetchApi: getFetch(config),
middleware,
};

return new IndexOperationsApi(new ApiConfiguration(apiConfig));
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile to extract this configuration stuff into somewhere because we probably want it to be identical to the configs that are setup inside pinecone.ts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't seem like it necessitated a class as we're not doing the same sort of logic as in vectorOperationsProvider. We now have a centralized place for the index API configurations: indexOperationsBuilder.


const _buildIndexOperationsApi = (config: PineconeConfiguration) => {
const { apiKey } = config;
const controllerPath = `https://api.pinecone.io`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some way of overriding this value using an env var like PINECONE_CONTROLLER_HOST would be useful in getting the integration tests working over the short-term while api.pinecone.io is still being worked on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. I almost felt like we could leave PINECONE_ENVIRONMENT as an optional env variable for now and then either use PINECONE_CONTROLLER_HOST if provided, or PINECONE_ENVIRONMENT to build it if provided, or default to https://api.pinecone.io, but that seemed like it would overly complicate things when we can just provide the URL anyways.

@@ -0,0 +1,27 @@
import {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed a bit silly to unit test, and the only thing we'd really care about checking is the proper controllerPath being set. Since the internals of the OpenAPI classes are protected, that's not necessarily easy unless we want to use a ts-ignore.

@@ -219,7 +200,17 @@ export class Pinecone {
* @returns A promise that resolves to {@link IndexMeta}
*/
describeIndex(indexName: IndexName) {
return this._describeIndex(indexName);
const describeIndexPromise = this._describeIndex(indexName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped from passing a callback to appending a .then() on the promise before returning.

const deleteIndexPromise = this._deleteIndex(indexName);

// When an index is deleted, we need to evict the host from the IndexHostSingleton cache.
deleteIndexPromise.then(() => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should work for evicting things from the cache if the index has been deleted.

@austin-denoble austin-denoble force-pushed the adenoble/refactor-project-id-singleton branch from 2b21bb4 to 5eaffe3 Compare October 31, 2023 19:13
@austin-denoble austin-denoble merged commit 1b06250 into spruce Oct 31, 2023
16 of 19 checks passed
@austin-denoble austin-denoble deleted the adenoble/refactor-project-id-singleton branch October 31, 2023 19:18
austin-denoble added a commit that referenced this pull request Dec 27, 2023
## Problem
In the future, we'll be using `describeIndex` to retrieve the specific
`host` value for a given Index. Previously, this was handled internally
by the client via configuration values and pinging `whoami`, and caching
the results to prevent unneeded requests and latency.

We want to maintain this caching behavior for `apiKey` -> `indexName`
while simplifying the client configuration and calling `describeIndex`
to retrieve the `host`.

## Solution
The new `IndexHostSingleton` replaces the existing `ProjectIdSingleton`.
The functionality is very similar between the two singletons at a high
level, but the code paths for making network calls are different. I've
also gone ahead and pulled out `projectId` and `environment` as
configuration values as they'll no longer be needed to generate the data
plane address.

- Delete `ProjectIdSingleton` and add new `IndexHostSingleton`.
- Add unit tests for `IndexHostSingleton`.
- Remove `projectId` and `environment` from `PineconeConfiguration` and
update related areas, tests, documentation bits, etc. Add new
`PINECONE_CONTROLLER_HOST` environment variable.
- Add `.then()` callbacks to `describeIndex` and `deleteIndex` to both
set and delete the host from the cache when needed.

## Type of Change
- [X] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [X] This change requires a documentation update

## Test Plan
Testing is a bit tricky at the moment, but you should be able to pull
the branch down and test using the repl:

```
npm run repl

await init()
await client.listIndexes()

await client.index('my-index').describeIndexStats()
```

Make sure unit tests are passing in CI.

Integration tests are a bit borked right now, but there is a follow-up
ticket to address those specifically.
austin-denoble added a commit that referenced this pull request Jan 13, 2024
## Problem
In the future, we'll be using `describeIndex` to retrieve the specific
`host` value for a given Index. Previously, this was handled internally
by the client via configuration values and pinging `whoami`, and caching
the results to prevent unneeded requests and latency.

We want to maintain this caching behavior for `apiKey` -> `indexName`
while simplifying the client configuration and calling `describeIndex`
to retrieve the `host`.

## Solution
The new `IndexHostSingleton` replaces the existing `ProjectIdSingleton`.
The functionality is very similar between the two singletons at a high
level, but the code paths for making network calls are different. I've
also gone ahead and pulled out `projectId` and `environment` as
configuration values as they'll no longer be needed to generate the data
plane address.

- Delete `ProjectIdSingleton` and add new `IndexHostSingleton`.
- Add unit tests for `IndexHostSingleton`.
- Remove `projectId` and `environment` from `PineconeConfiguration` and
update related areas, tests, documentation bits, etc. Add new
`PINECONE_CONTROLLER_HOST` environment variable.
- Add `.then()` callbacks to `describeIndex` and `deleteIndex` to both
set and delete the host from the cache when needed.

## Type of Change
- [X] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [X] This change requires a documentation update

## Test Plan
Testing is a bit tricky at the moment, but you should be able to pull
the branch down and test using the repl:

```
npm run repl

await init()
await client.listIndexes()

await client.index('my-index').describeIndexStats()
```

Make sure unit tests are passing in CI.

Integration tests are a bit borked right now, but there is a follow-up
ticket to address those specifically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants