Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include content length in the response of Get and GetRange #145

Merged
merged 7 commits into from
Oct 15, 2024

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Oct 8, 2024

This pr adds implements ObjectSize() method on the return values of Get and GetRange. This should help readers preallocate buffers based on the content size to reduce number of allocations. They can call TryToGetSize on the reader to get the size, existing comment already highlights that this is best effort.

I did not add it for swift since the Length call could potentially result in another request to the server

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

Updated the acceptance test to check for object size after Get and GetRange calls

@ashwanthgoli ashwanthgoli marked this pull request as ready for review October 8, 2024 14:17
return r, err
}

return objstore.ObjectSizerReadCloser{ReadCloser: r, Size: r.Attrs.Size}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the size be set to length here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/googleapis/google-cloud-go/blob/main/storage/reader.go#L39
this field holds the length according to the source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where that value is set: https://github.com/googleapis/google-cloud-go/blob/main/storage/grpc_client.go#L1126.

According to the code, this is the size of the entire object, even if a range is requested: https://github.com/googleapis/google-cloud-go/blob/main/storage/grpc_client.go#L1109-L1110.

We might want to test this better to be sure we're returning the right value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! adding it for GetRange was an after thought, i'll have another look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be the one that we need. We have e2e tests that run in the CI which could be useful for this: https://github.com/thanos-io/objstore/blob/main/testing.go#L83.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other providers are returning content length which i am hoping reflects the requested range, I'll try to do a basic sanity check for a couple of providers tomorrow.

if this feels risky, i can limit the change to Get call alone

Copy link
Contributor Author

@ashwanthgoli ashwanthgoli Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 i'll add something there :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpetkovski added the tests, it's looking good. can you take another look when you get a chance? Thanks!

return o, err
}

return objstore.ObjectSizerReadCloser{ReadCloser: o, Size: stat.Size}, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references:

size is taken from Content-Length header
https://github.com/minio/minio-go/blob/master/utils.go#L264
https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html

HeadObject returns only the metadata for an object. If the Range is satisfiable, only the ContentLength is affected in the response. If the Range is not satisfiable, S3 returns a 416 - Requested Range Not Satisfiable error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this adding an additional request to the GetRange call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minio-go saves the objectInfo when we get the object. calling Stat() after a successful GetObject() call won't trigger the HEAD request i think.

this is very likely not going to cause a regression since this code executes only when some asks for Size().
but let me know if you notice any problems :)

@ashwanthgoli ashwanthgoli force-pushed the get-include-size branch 2 times, most recently from 445546e to b7187b0 Compare October 10, 2024 06:40

return objstore.ObjectSizerReadCloser{
ReadCloser: file,
Size: file.Length,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file.Length() returns the value of Content-Length response header if it is set, else it makes a HEAD call to the store

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I just have one comment.


type ObjectSizerReadCloser struct {
io.ReadCloser
Size func() (int64, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, we always return nil as the error, so I wonder why we need to have it as a return argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swift could potentially return err if the HEAD call fails

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

Signed-off-by: Ashwanth Goli <[email protected]>
Signed-off-by: Ashwanth Goli <[email protected]>
Signed-off-by: Ashwanth Goli <[email protected]>
Signed-off-by: Ashwanth Goli <[email protected]>
Signed-off-by: Ashwanth Goli <[email protected]>
Signed-off-by: Ashwanth Goli <[email protected]>
@ashwanthgoli
Copy link
Contributor Author

@fpetkovski can you help merge this change? :)

@fpetkovski fpetkovski merged commit 5f04b8b into thanos-io:main Oct 15, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants