-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for serializing/deserializing user-defined stream-level metadata. #133
Conversation
WalkthroughThis pull request introduces user-defined metadata functionality to the serialization and deserialization process. The changes span multiple files across the project, adding a new optional parameter Changes
Possibly Related PRs
Suggested Reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
clp_ffi_py/ir/native.pyi
(2 hunks)src/clp
(1 hunks)src/clp_ffi_py/ir/native/PyDeserializer.cpp
(5 hunks)src/clp_ffi_py/ir/native/PyDeserializer.hpp
(2 hunks)src/clp_ffi_py/ir/native/PySerializer.cpp
(6 hunks)tests/test_ir/test_serder.py
(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/clp
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_py/ir/native/PyDeserializer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_py/ir/native/PySerializer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.hpp:246-248
Timestamp: 2024-11-23T05:54:37.767Z
Learning: In the `src/clp_ffi_py/ir/native` module, raw pointers with `gsl::owner` are intentionally used instead of `std::unique_ptr` to comply with CPython's memory management requirements.
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:218-242
Timestamp: 2024-11-23T05:55:57.106Z
Learning: In `src/clp_ffi_py/ir/native/PyDeserializer.cpp`, the `deserialize_log_event` method in the `PyDeserializer` class ensures proper loop termination and does not risk infinite loops, even when handling incomplete or corrupted streams.
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:206-208
Timestamp: 2024-11-23T05:55:08.019Z
Learning: In code interfacing with CPython (e.g., `src/clp_ffi_py/ir/native/PyDeserializer.cpp`), using raw pointers for memory management is acceptable due to CPython's memory management requirements.
⏰ Context from checks skipped due to timeout of 90000ms (18)
- GitHub Check: Build cp310-win_amd64
- GitHub Check: Build cp311-musllinux_i686
- GitHub Check: Build cp311-musllinux_aarch64
- GitHub Check: Build cp311-manylinux_x86_64
- GitHub Check: Build cp311-manylinux_i686
- GitHub Check: Build cp311-manylinux_aarch64
- GitHub Check: Build cp310-musllinux_x86_64
- GitHub Check: Build cp310-musllinux_i686
- GitHub Check: Build cp310-musllinux_aarch64
- GitHub Check: Build cp310-manylinux_x86_64
- GitHub Check: Build cp310-manylinux_i686
- GitHub Check: Build cp310-manylinux_aarch64
- GitHub Check: Build cp311-macosx_arm64
- GitHub Check: Build cp310-macosx_x86_64
- GitHub Check: Build cp310-macosx_universal2
- GitHub Check: Build cp310-macosx_arm64
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (15)
clp_ffi_py/ir/native.pyi (2)
93-98
: LGTM! The__init__
signature is well-defined.The optional
user_defined_metadata
parameter is correctly typed and positioned with an appropriate default value.
121-121
: LGTM! The method signature is well-defined.The return type
Optional[Dict[str, Any]]
is appropriate for JSON-like metadata.tests/test_ir/test_serder.py (4)
1-1
: LGTM! Required import is added.The
json
import is necessary for metadata serialization/deserialization.
20-35
: LGTM! Comprehensive test data covering all JSON types.The test data includes:
- Primitive types (int, float, bool, string)
- Special values (null)
- Complex types (arrays, nested objects)
70-72
: LGTM! Proper metadata serialization.The metadata is correctly converted to a JSON string before being passed to the serializer.
124-127
: LGTM! Proper metadata verification.The test verifies that the deserialized metadata matches the original data using deep comparison.
src/clp_ffi_py/ir/native/PyDeserializer.hpp (2)
15-15
: LGTM! Required headers are added.The new headers support JSON handling and protocol constants for metadata keys.
Also applies to: 19-19
113-118
: LGTM! Method declaration is well-defined.The method:
- Uses
[[nodiscard]]
to enforce return value handling- Returns a const pointer to prevent metadata modification
- Has clear documentation
src/clp_ffi_py/ir/native/PyDeserializer.cpp (5)
6-6
: LGTM! Required headers are added.The headers provide necessary support for string handling, protocol constants, and JSON operations.
Also applies to: 14-14, 19-19
75-90
: LGTM! Documentation follows Python conventions.The docstring:
- Clearly describes the functionality
- Documents return types and values
- Uses proper reST formatting
104-108
: LGTM! Method is properly exposed to Python.The method table entry correctly sets up the Python binding with appropriate flags and documentation.
183-223
: LGTM! Robust Python method implementation.The implementation:
- Handles all error cases (type errors, serialization failures)
- Properly converts JSON to Python dictionary
- Follows Python reference counting rules
354-363
: LGTM! Clean C++ implementation.The implementation:
- Uses protocol-defined constants for metadata key
- Safely handles missing metadata case
- Returns const pointer to prevent modification
src/clp_ffi_py/ir/native/PySerializer.cpp (2)
15-15
: LGTM! Documentation and includes are well-structured.The docstring changes properly document the new parameter, and the necessary JSON library is included for metadata parsing.
Also applies to: 36-48
319-319
: LGTM! Proper handling of optional metadata.The optional metadata is correctly passed to the serializer's create method.
std::optional<nlohmann::json> optional_user_defined_metadata; | ||
if (Py_None != user_defined_metadata) { | ||
if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) { | ||
PyErr_Format( | ||
PyExc_TypeError, | ||
"`%s` must be given as a `str` object.", | ||
static_cast<char const*>(keyword_user_defined_metadata) | ||
); | ||
return -1; | ||
} | ||
Py_ssize_t json_str_size{}; | ||
auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)}; | ||
if (nullptr == json_str_data) { | ||
return -1; | ||
} | ||
auto parsed_user_defined_metadata = nlohmann::json::parse( | ||
std::string_view{json_str_data, static_cast<size_t>(json_str_size)}, | ||
nullptr, | ||
false | ||
); | ||
if (parsed_user_defined_metadata.is_discarded()) { | ||
PyErr_Format( | ||
PyExc_RuntimeError, | ||
"Failed to parse `%s`: %s", | ||
static_cast<char const*>(keyword_user_defined_metadata), | ||
json_str_data | ||
); | ||
return -1; | ||
} | ||
optional_user_defined_metadata = std::move(parsed_user_defined_metadata); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation to ensure parsed metadata is a JSON object.
While the JSON parsing is handled well, there's no validation to ensure the parsed result is specifically a JSON object. Other JSON types (arrays, primitives) should be rejected.
Add this validation after line 306:
if (parsed_user_defined_metadata.is_discarded()) {
PyErr_Format(
PyExc_RuntimeError,
"Failed to parse `%s`: %s",
static_cast<char const*>(keyword_user_defined_metadata),
json_str_data
);
return -1;
}
+ if (!parsed_user_defined_metadata.is_object()) {
+ PyErr_Format(
+ PyExc_ValueError,
+ "`%s` must be a JSON object, got %s",
+ static_cast<char const*>(keyword_user_defined_metadata),
+ parsed_user_defined_metadata.type_name()
+ );
+ return -1;
+ }
optional_user_defined_metadata = std::move(parsed_user_defined_metadata);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::optional<nlohmann::json> optional_user_defined_metadata; | |
if (Py_None != user_defined_metadata) { | |
if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) { | |
PyErr_Format( | |
PyExc_TypeError, | |
"`%s` must be given as a `str` object.", | |
static_cast<char const*>(keyword_user_defined_metadata) | |
); | |
return -1; | |
} | |
Py_ssize_t json_str_size{}; | |
auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)}; | |
if (nullptr == json_str_data) { | |
return -1; | |
} | |
auto parsed_user_defined_metadata = nlohmann::json::parse( | |
std::string_view{json_str_data, static_cast<size_t>(json_str_size)}, | |
nullptr, | |
false | |
); | |
if (parsed_user_defined_metadata.is_discarded()) { | |
PyErr_Format( | |
PyExc_RuntimeError, | |
"Failed to parse `%s`: %s", | |
static_cast<char const*>(keyword_user_defined_metadata), | |
json_str_data | |
); | |
return -1; | |
} | |
optional_user_defined_metadata = std::move(parsed_user_defined_metadata); | |
} | |
std::optional<nlohmann::json> optional_user_defined_metadata; | |
if (Py_None != user_defined_metadata) { | |
if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) { | |
PyErr_Format( | |
PyExc_TypeError, | |
"`%s` must be given as a `str` object.", | |
static_cast<char const*>(keyword_user_defined_metadata) | |
); | |
return -1; | |
} | |
Py_ssize_t json_str_size{}; | |
auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)}; | |
if (nullptr == json_str_data) { | |
return -1; | |
} | |
auto parsed_user_defined_metadata = nlohmann::json::parse( | |
std::string_view{json_str_data, static_cast<size_t>(json_str_size)}, | |
nullptr, | |
false | |
); | |
if (parsed_user_defined_metadata.is_discarded()) { | |
PyErr_Format( | |
PyExc_RuntimeError, | |
"Failed to parse `%s`: %s", | |
static_cast<char const*>(keyword_user_defined_metadata), | |
json_str_data | |
); | |
return -1; | |
} | |
if (!parsed_user_defined_metadata.is_object()) { | |
PyErr_Format( | |
PyExc_ValueError, | |
"`%s` must be a JSON object, got %s", | |
static_cast<char const*>(keyword_user_defined_metadata), | |
parsed_user_defined_metadata.type_name() | |
); | |
return -1; | |
} | |
optional_user_defined_metadata = std::move(parsed_user_defined_metadata); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the rabbit if we're to accept a str
type metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junhaoliao Thanks for confirming! I'll add this to our knowledge base to ensure we maintain this validation requirement for JSON metadata.
✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-py#133
File: src/clp_ffi_py/ir/native/PySerializer.cpp:287-317
Timestamp: 2025-01-26T04:49:24.199Z
Learning: In clp-ffi-py, when accepting string input for JSON metadata, validation must be performed to ensure the parsed result is specifically a JSON object (not arrays or primitives) using the `is_object()` method from nlohmann::json.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type validation will be handled by ffi core API, when we use this parsed object to create a serializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
":return:\n" | ||
" - The deserialized user-defined stream-level metadata, loaded as a" | ||
" dictionary.\n" | ||
" - None if user-defined stream-level metadata was not given in the deserialized" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be less handling for API users if we return an empty dictionary if "user-defined stream-level metadata was not given in the deserialized IR stream"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, it does seem in the Serializer we do permit users to provide a None
or "{}" value. I haven't traced down to the CLP code but I think the given metadata string will be encoded as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to return None
indicating that the user-defined metadata wasn't specified in the source.
It is possible to encode an empty dictionary as the metadata though.
if (false == user_defined_metadata->is_object()) { | ||
PyErr_SetString( | ||
PyExc_TypeError, | ||
"User-defined stream-level metadata is not a JSON object." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the type of this JSON value is not an object, it means either the file is not created using a Serializer we provide, or something went wrong in our code, right?
Though i imagine such cases are rare, would it be more helpful to debug such case if we include the JSON value type in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the only possible case is that someone created a stream that follows our format but doesn't enforce the use of a JSON object as the metadata. Technically we should check this in the ffi core. Let me think about it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As improved in y-scope/clp#695, we moved this check to the ffi core. The latest commit should have this check removed (however, we need to merge #134 first).
return nullptr; | ||
} | ||
|
||
PyObjectPtr<PyObject> py_metadata_dict{py_utils_parse_json_str(json_str)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for not being able find any previous discussions on this - we have considered the https://github.com/pybind/pybind11_json and other alternatives discussed at https://www.github.com/nlohmann/json/issues/1014 , right?
just curious, for more efficient conversions, do we have plans to make use of those or create our own bindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't in any of the critical path anyways, I would probably defer it to future release
@@ -42,6 +43,9 @@ PyDoc_STRVAR( | |||
":param buffer_size_limit: The maximum amount of serialized data to buffer before flushing" | |||
" it to `output_stream`. Defaults to 64 KiB.\n" | |||
":type buffer_size_limit: int\n" | |||
":param user_defined_metadata: JSON-encoded string containing user-defined stream-level" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an easier to use interface is to accept a dict type. If we're time-constraint to implement that, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I think we can fix it with a Python API binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
std::optional<nlohmann::json> optional_user_defined_metadata; | ||
if (Py_None != user_defined_metadata) { | ||
if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) { | ||
PyErr_Format( | ||
PyExc_TypeError, | ||
"`%s` must be given as a `str` object.", | ||
static_cast<char const*>(keyword_user_defined_metadata) | ||
); | ||
return -1; | ||
} | ||
Py_ssize_t json_str_size{}; | ||
auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)}; | ||
if (nullptr == json_str_data) { | ||
return -1; | ||
} | ||
auto parsed_user_defined_metadata = nlohmann::json::parse( | ||
std::string_view{json_str_data, static_cast<size_t>(json_str_size)}, | ||
nullptr, | ||
false | ||
); | ||
if (parsed_user_defined_metadata.is_discarded()) { | ||
PyErr_Format( | ||
PyExc_RuntimeError, | ||
"Failed to parse `%s`: %s", | ||
static_cast<char const*>(keyword_user_defined_metadata), | ||
json_str_data | ||
); | ||
return -1; | ||
} | ||
optional_user_defined_metadata = std::move(parsed_user_defined_metadata); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the rabbit if we're to accept a str
type metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/clp_ffi_py/Py_utils.cpp (1)
135-158
: Enhance the type error message.The error message could be more helpful by including the actual type received.
Apply this diff to improve the error message:
if (false == static_cast<bool>(PyUnicode_Check(result))) { PyErr_Format( PyExc_TypeError, - "`%s` is supposed to return a `str` object", - cPyFuncNameSerializeDictToJsonStr + "`%s` is supposed to return a `str` object, got `%s` instead", + cPyFuncNameSerializeDictToJsonStr, + Py_TYPE(result)->tp_name ); return nullptr; }tests/test_ir/test_serializer.py (1)
169-175
: Add more test cases for user-defined metadata.While the current test cases verify error handling for invalid types, consider adding the following test cases for better coverage:
- Valid dictionary with simple values
- None value (default case)
- Dictionary with nested structures
- Dictionary with invalid JSON values (e.g., custom objects)
Here's a suggested implementation:
def test_invalid_user_defined_metadata(self) -> None: byte_buffer: BytesIO = BytesIO() with self.assertRaises(TypeError): _ = Serializer(byte_buffer, user_defined_metadata="str") # type: ignore with self.assertRaises(TypeError): _ = Serializer(byte_buffer, user_defined_metadata=[1, "str"]) # type: ignore + + def test_valid_user_defined_metadata(self) -> None: + byte_buffer: BytesIO = BytesIO() + # Test None (default) + serializer = Serializer(byte_buffer) + serializer.close() + + # Test valid dictionary with simple values + metadata = {"key": "value", "number": 42} + serializer = Serializer(byte_buffer, user_defined_metadata=metadata) + serializer.close() + + # Test dictionary with nested structures + metadata = {"nested": {"key": "value"}, "array": [1, 2, 3]} + serializer = Serializer(byte_buffer, user_defined_metadata=metadata) + serializer.close() + + # Test dictionary with invalid JSON values + class CustomObject: + pass + with self.assertRaises(TypeError): + metadata = {"invalid": CustomObject()} + _ = Serializer(byte_buffer, user_defined_metadata=metadata)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
clp_ffi_py/ir/native.pyi
(2 hunks)clp_ffi_py/utils.py
(1 hunks)src/clp_ffi_py/PyObjectCast.hpp
(1 hunks)src/clp_ffi_py/Py_utils.cpp
(3 hunks)src/clp_ffi_py/Py_utils.hpp
(1 hunks)src/clp_ffi_py/ir/native/PySerializer.cpp
(6 hunks)tests/test_ir/test_serder.py
(3 hunks)tests/test_ir/test_serializer.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test_ir/test_serder.py
- clp_ffi_py/ir/native.pyi
🧰 Additional context used
📓 Path-based instructions (4)
src/clp_ffi_py/PyObjectCast.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_py/Py_utils.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_py/Py_utils.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_py/ir/native/PySerializer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
src/clp_ffi_py/Py_utils.hpp (2)
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#80
File: src/clp_ffi_py/Py_utils.cpp:80-87
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/clp_ffi_py/Py_utils.cpp`, for functions like `py_utils_serialize_dict_to_msgpack`, type checking is handled by the Python utilities, and we avoid adding redundant type checks in the C++ code.
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#83
File: src/clp_ffi_py/Py_utils.cpp:114-120
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `src/clp_ffi_py/Py_utils.cpp` file, for the `py_utils_parse_json_str` function, explicit UTF-8 validation of the input string is unnecessary because Python will throw exceptions if the input string is not valid UTF-8.
src/clp_ffi_py/Py_utils.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#80
File: src/clp_ffi_py/Py_utils.cpp:80-87
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/clp_ffi_py/Py_utils.cpp`, for functions like `py_utils_serialize_dict_to_msgpack`, type checking is handled by the Python utilities, and we avoid adding redundant type checks in the C++ code.
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: Build cp311-win_amd64
- GitHub Check: Build cp310-win_amd64
- GitHub Check: Build cp312-manylinux_aarch64
- GitHub Check: Build cp311-musllinux_x86_64
- GitHub Check: Build cp311-musllinux_i686
- GitHub Check: Build cp311-musllinux_aarch64
- GitHub Check: Build cp311-manylinux_x86_64
- GitHub Check: Build cp311-manylinux_i686
- GitHub Check: Build cp311-manylinux_aarch64
- GitHub Check: Build cp310-musllinux_x86_64
- GitHub Check: Build cp310-musllinux_i686
- GitHub Check: Build cp310-musllinux_aarch64
- GitHub Check: Build cp310-manylinux_x86_64
- GitHub Check: Build cp310-manylinux_i686
- GitHub Check: Build cp310-manylinux_aarch64
- GitHub Check: Build cp311-macosx_universal2
- GitHub Check: Build cp311-macosx_arm64
- GitHub Check: Build cp310-macosx_x86_64
- GitHub Check: Build cp310-macosx_universal2
- GitHub Check: Build cp310-macosx_arm64
🔇 Additional comments (8)
src/clp_ffi_py/Py_utils.hpp (1)
48-55
: LGTM!The function declaration is well-documented and follows the existing patterns in the codebase.
src/clp_ffi_py/PyObjectCast.hpp (1)
136-136
: LGTM!The macro specialization for
PyUnicodeObject
is correctly added and follows the existing pattern.src/clp_ffi_py/Py_utils.cpp (2)
19-19
: LGTM!The constant and static pointer declarations follow the existing patterns.
Also applies to: 26-26
73-80
: LGTM!The initialization code follows the existing patterns and includes proper error handling.
src/clp_ffi_py/ir/native/PySerializer.cpp (4)
Line range hint
37-51
: LGTM! Clear and comprehensive documentation.The documentation for the new
user_defined_metadata
parameter is well-written and includes all necessary details about the parameter type and validation requirements.
Line range hint
228-251
: LGTM! Parameter parsing is correctly implemented.The changes properly integrate the new optional parameter into the existing argument parsing infrastructure.
331-331
: LGTM! Metadata correctly passed to serializer.The optional metadata is properly passed to the serializer's create method.
290-329
: Note regarding JSON object validation.Previous review discussion indicated that JSON object type validation will be handled by the FFI core API rather than at this layer.
Let's verify the FFI core API validation:
✅ Verification successful
JSON validation delegation to FFI core API confirmed
The code correctly delegates JSON validation to the FFI core API through
clp::ffi::ir_stream::Serializer
. This aligns with the previous review discussion.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for JSON object validation in FFI core API rg -A 5 'create.*user_defined_metadata'Length of output: 638
Script:
#!/bin/bash # Search for ClpIrSerializer class and create method implementation ast-grep --pattern 'class ClpIrSerializer { $$$ static $_ create($_) { $$$ } $$$ }'Length of output: 97
Script:
#!/bin/bash # Find files containing ClpIrSerializer definition rg -l "class.*ClpIrSerializer" --type cpp --type hppLength of output: 82
Script:
#!/bin/bash # Search for ClpIrSerializer class and create method rg -A 10 "ClpIrSerializer"Length of output: 4822
Script:
#!/bin/bash # Search for the core Serializer implementation rg -A 10 "namespace.*ir_stream" --type cpp --type hpp rg -A 10 "class.*Serializer.*ir_stream" --type cpp --type hppLength of output: 175
clp_ffi_py/utils.py
Outdated
def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str: | ||
""" | ||
Serializes the given dictionary into a JSON string. | ||
|
||
:param dictionary: The dictionary to serialize. | ||
:return: JSON string of serialized dictionary. | ||
:raises: TypeError The given input is not a dictionary. | ||
""" | ||
if not isinstance(dictionary, dict): | ||
raise TypeError("The type of the input object must be a dictionary.") | ||
return json.dumps(dictionary) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for JSON serialization failures.
The function should handle potential JSON serialization errors that could occur with complex Python objects or circular references.
Apply this diff to add error handling:
def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str:
"""
Serializes the given dictionary into a JSON string.
:param dictionary: The dictionary to serialize.
:return: JSON string of serialized dictionary.
:raises: TypeError The given input is not a dictionary.
+ :raises: TypeError The dictionary contains values that are not JSON serializable.
"""
if not isinstance(dictionary, dict):
raise TypeError("The type of the input object must be a dictionary.")
- return json.dumps(dictionary)
+ try:
+ return json.dumps(dictionary)
+ except TypeError as e:
+ raise TypeError(f"The dictionary contains values that are not JSON serializable: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str: | |
""" | |
Serializes the given dictionary into a JSON string. | |
:param dictionary: The dictionary to serialize. | |
:return: JSON string of serialized dictionary. | |
:raises: TypeError The given input is not a dictionary. | |
""" | |
if not isinstance(dictionary, dict): | |
raise TypeError("The type of the input object must be a dictionary.") | |
return json.dumps(dictionary) | |
def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str: | |
""" | |
Serializes the given dictionary into a JSON string. | |
:param dictionary: The dictionary to serialize. | |
:return: JSON string of serialized dictionary. | |
:raises: TypeError The given input is not a dictionary. | |
:raises: TypeError The dictionary contains values that are not JSON serializable. | |
""" | |
if not isinstance(dictionary, dict): | |
raise TypeError("The type of the input object must be a dictionary.") | |
try: | |
return json.dumps(dictionary) | |
except TypeError as e: | |
raise TypeError(f"The dictionary contains values that are not JSON serializable: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is re-throwing the same exception here making any difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinZhihao-723 You make a good point. Re-throwing the same TypeError doesn't add significant value here since json.dumps already provides a descriptive error message indicating which value couldn't be serialized. The extra try/except block would just add complexity without meaningful benefit. The original implementation letting json.dumps' TypeError propagate naturally is cleaner and sufficient.
✏️ Learnings added
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#133
File: clp_ffi_py/utils.py:51-62
Timestamp: 2025-01-26T10:21:10.011Z
Learning: For the serialize_dict_to_json_str function in clp_ffi_py/utils.py, let json.dumps' TypeError propagate naturally as it already provides sufficient error context, rather than catching and re-throwing with a custom message.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pr title is fine to be used as the final commit message
Co-authored-by: Junhao Liao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)
183-215
: LGTM! Consider future performance optimization.The implementation is robust with proper error handling and type verification. As discussed in previous reviews, consider using pybind11_json in a future release for more efficient JSON conversions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
clp_ffi_py/utils.py
(1 hunks)src/clp
(1 hunks)src/clp_ffi_py/ir/native/PyDeserializer.cpp
(5 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/clp
🚧 Files skipped from review as they are similar to previous changes (1)
- clp_ffi_py/utils.py
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.hpp:246-248
Timestamp: 2024-11-23T05:54:37.767Z
Learning: In the `src/clp_ffi_py/ir/native` module, raw pointers with `gsl::owner` are intentionally used instead of `std::unique_ptr` to comply with CPython's memory management requirements.
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:218-242
Timestamp: 2024-11-23T05:55:57.106Z
Learning: In `src/clp_ffi_py/ir/native/PyDeserializer.cpp`, the `deserialize_log_event` method in the `PyDeserializer` class ensures proper loop termination and does not risk infinite loops, even when handling incomplete or corrupted streams.
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:206-208
Timestamp: 2024-11-23T05:55:08.019Z
Learning: In code interfacing with CPython (e.g., `src/clp_ffi_py/ir/native/PyDeserializer.cpp`), using raw pointers for memory management is acceptable due to CPython's memory management requirements.
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: Build cp311-win_amd64
- GitHub Check: Build cp310-win_amd64
- GitHub Check: Build cp312-manylinux_aarch64
- GitHub Check: Build cp311-musllinux_x86_64
- GitHub Check: Build cp311-musllinux_i686
- GitHub Check: Build cp311-musllinux_aarch64
- GitHub Check: Build cp311-manylinux_x86_64
- GitHub Check: Build cp311-manylinux_i686
- GitHub Check: Build cp311-manylinux_aarch64
- GitHub Check: Build cp310-musllinux_x86_64
- GitHub Check: Build cp310-musllinux_i686
- GitHub Check: Build cp310-musllinux_aarch64
- GitHub Check: Build cp310-manylinux_x86_64
- GitHub Check: Build cp310-manylinux_i686
- GitHub Check: Build cp310-manylinux_aarch64
- GitHub Check: Build cp311-macosx_universal2
- GitHub Check: Build cp311-macosx_arm64
- GitHub Check: Build cp310-macosx_x86_64
- GitHub Check: Build cp310-macosx_universal2
- GitHub Check: Build cp310-macosx_arm64
🔇 Additional comments (5)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (5)
6-6
: LGTM! Required headers added.The new headers are necessary for string handling, protocol constants, and JSON processing.
Also applies to: 14-14, 19-19
75-89
: LGTM! Clear and comprehensive documentation.The documentation clearly specifies the return type and behaviour, aligning with the discussed approach of returning
None
for missing metadata.
90-90
: LGTM! Method declaration follows conventions.The declaration is consistent with the project's Python C API binding patterns.
104-107
: LGTM! Method table entry is properly configured.The entry correctly sets up the Python binding with no arguments and links to the documentation.
346-355
: LGTM! Safe and efficient implementation.The implementation safely checks for key existence before access and returns a pointer to avoid copying, aligning with CPython's memory management requirements.
Description
This PR depends on #132, #134.
This PR adds support for serializing/deserializing user-defined stream-level metadata. As discussed offline, we decided to constrain the metadata input into a JSON string to simplify the type-checking.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Documentation
Tests
The changes enhance the serialization framework by allowing users to attach and retrieve custom metadata during stream operations, providing more flexibility in data processing.