feat: Add support for serializing/deserializing user-defined stream-level metadata. #133

LinZhihao-723 · 2025-01-26T02:54:42Z

Description

This PR depends on #132, #134.
This PR adds support for serializing/deserializing user-defined stream-level metadata. As discussed offline, we decided to constrain the metadata input into a JSON string to simplify the type-checking.

Validation performed

Ensure all workflows passed.
Update existing unit tests to ensure user-defined metadata can be successfully serialized/deserialized.

Summary by CodeRabbit

New Features
- Added support for user-defined metadata in serialization and deserialization processes.
- Introduced ability to pass and retrieve custom metadata during stream processing.
- New function for serializing Python dictionaries to JSON strings.
Documentation
- Updated method signatures and documentation to explain new metadata functionality.
Tests
- Added test cases to validate user-defined metadata handling.
- Implemented error checking for invalid metadata inputs.

The changes enhance the serialization framework by allowing users to attach and retrieve custom metadata during stream operations, providing more flexibility in data processing.

coderabbitai · 2025-01-26T02:54:49Z

Walkthrough

This pull request introduces user-defined metadata functionality to the serialization and deserialization process. The changes span multiple files across the project, adding a new optional parameter user_defined_metadata to the Serializer class and a corresponding get_user_defined_metadata method to the Deserializer class. The implementation includes type checking, JSON serialization, and integration with existing serialization infrastructure, allowing users to attach custom metadata during the serialization process.

Changes

File	Change Summary
`clp_ffi_py/ir/native.pyi`	Updated `Serializer` constructor with optional `user_defined_metadata` parameter; Added `get_user_defined_metadata` method to `Deserializer`
`src/clp_ffi_py/ir/native/PyDeserializer.*`	Added `get_user_defined_metadata` method to retrieve stream-level metadata
`src/clp_ffi_py/ir/native/PySerializer.cpp`	Modified `__init__` to accept and process `user_defined_metadata`
`clp_ffi_py/utils.py`	Added `serialize_dict_to_json_str` utility function
`src/clp_ffi_py/Py_utils.*`	Added JSON serialization support and related utility functions
`tests/test_ir/test_serder.py`	Updated test base class to include user-defined metadata testing
`tests/test_ir/test_serializer.py`	Added test for invalid user-defined metadata scenarios
`src/clp`	Updated subproject commit identifier

Possibly Related PRs

feat: Add Python class Deserializer to deserialize log events from CLP key-value pair IR stream. #94: The changes in the main PR regarding the Deserializer class and its new method get_user_defined_metadata are related to this PR, which introduces a new Deserializer class that includes methods for deserializing log events, enhancing the overall deserialization functionality.
feat: Add Python class Serializer to serialize log events into CLP key-value pair IR format. #83: The main PR's modifications to the Serializer class, including the addition of a new parameter for user-defined metadata, are related to this PR, which adds a Serializer class designed to serialize log events, indicating a direct connection in functionality.
chore: Update to the latest clp submodule commit. #132: The main PR's updates to the subproject commit identifier are related to this PR, which also updates the CLP submodule to its latest commit, ensuring that the project is using the most recent version of its dependencies.

Suggested Reviewers

junhaoliao

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e7eff4 and f49a098.

📒 Files selected for processing (6)

clp_ffi_py/ir/native.pyi (2 hunks)
src/clp (1 hunks)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (5 hunks)
src/clp_ffi_py/ir/native/PyDeserializer.hpp (2 hunks)
src/clp_ffi_py/ir/native/PySerializer.cpp (6 hunks)
tests/test_ir/test_serder.py (4 hunks)

✅ Files skipped from review due to trivial changes (1)

src/clp

🧰 Additional context used

📓 Path-based instructions (3)

src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PyDeserializer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PySerializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)

src/clp_ffi_py/ir/native/PyDeserializer.cpp (3)

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.hpp:246-248
Timestamp: 2024-11-23T05:54:37.767Z
Learning: In the `src/clp_ffi_py/ir/native` module, raw pointers with `gsl::owner` are intentionally used instead of `std::unique_ptr` to comply with CPython's memory management requirements.

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:218-242
Timestamp: 2024-11-23T05:55:57.106Z
Learning: In `src/clp_ffi_py/ir/native/PyDeserializer.cpp`, the `deserialize_log_event` method in the `PyDeserializer` class ensures proper loop termination and does not risk infinite loops, even when handling incomplete or corrupted streams.

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:206-208
Timestamp: 2024-11-23T05:55:08.019Z
Learning: In code interfacing with CPython (e.g., `src/clp_ffi_py/ir/native/PyDeserializer.cpp`), using raw pointers for memory management is acceptable due to CPython's memory management requirements.

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: Build cp310-win_amd64
GitHub Check: Build cp311-musllinux_i686
GitHub Check: Build cp311-musllinux_aarch64
GitHub Check: Build cp311-manylinux_x86_64
GitHub Check: Build cp311-manylinux_i686
GitHub Check: Build cp311-manylinux_aarch64
GitHub Check: Build cp310-musllinux_x86_64
GitHub Check: Build cp310-musllinux_i686
GitHub Check: Build cp310-musllinux_aarch64
GitHub Check: Build cp310-manylinux_x86_64
GitHub Check: Build cp310-manylinux_i686
GitHub Check: Build cp310-manylinux_aarch64
GitHub Check: Build cp311-macosx_arm64
GitHub Check: Build cp310-macosx_x86_64
GitHub Check: Build cp310-macosx_universal2
GitHub Check: Build cp310-macosx_arm64
GitHub Check: lint-check (ubuntu-latest)
GitHub Check: lint-check (macos-latest)

🔇 Additional comments (15)

clp_ffi_py/ir/native.pyi (2)

93-98: LGTM! The __init__ signature is well-defined.

The optional user_defined_metadata parameter is correctly typed and positioned with an appropriate default value.

121-121: LGTM! The method signature is well-defined.

The return type Optional[Dict[str, Any]] is appropriate for JSON-like metadata.

tests/test_ir/test_serder.py (4)

1-1: LGTM! Required import is added.

The json import is necessary for metadata serialization/deserialization.

20-35: LGTM! Comprehensive test data covering all JSON types.

The test data includes:

Primitive types (int, float, bool, string)

Special values (null)

Complex types (arrays, nested objects)

70-72: LGTM! Proper metadata serialization.

The metadata is correctly converted to a JSON string before being passed to the serializer.

124-127: LGTM! Proper metadata verification.

The test verifies that the deserialized metadata matches the original data using deep comparison.

src/clp_ffi_py/ir/native/PyDeserializer.hpp (2)

15-15: LGTM! Required headers are added.

The new headers support JSON handling and protocol constants for metadata keys.

Also applies to: 19-19

113-118: LGTM! Method declaration is well-defined.

The method:

Uses [[nodiscard]] to enforce return value handling

Returns a const pointer to prevent metadata modification

Has clear documentation

src/clp_ffi_py/ir/native/PyDeserializer.cpp (5)

6-6: LGTM! Required headers are added.

The headers provide necessary support for string handling, protocol constants, and JSON operations.

Also applies to: 14-14, 19-19

75-90: LGTM! Documentation follows Python conventions.

The docstring:

Clearly describes the functionality

Documents return types and values

Uses proper reST formatting

104-108: LGTM! Method is properly exposed to Python.

The method table entry correctly sets up the Python binding with appropriate flags and documentation.

183-223: LGTM! Robust Python method implementation.

The implementation:

Handles all error cases (type errors, serialization failures)

Properly converts JSON to Python dictionary

Follows Python reference counting rules

354-363: LGTM! Clean C++ implementation.

The implementation:

Uses protocol-defined constants for metadata key

Safely handles missing metadata case

Returns const pointer to prevent modification

src/clp_ffi_py/ir/native/PySerializer.cpp (2)

15-15: LGTM! Documentation and includes are well-structured.

The docstring changes properly document the new parameter, and the necessary JSON library is included for metadata parsing.

Also applies to: 36-48

319-319: LGTM! Proper handling of optional metadata.

The optional metadata is correctly passed to the serializer's create method.

coderabbitai · 2025-01-26T02:57:50Z

src/clp_ffi_py/ir/native/PySerializer.cpp

+    std::optional<nlohmann::json> optional_user_defined_metadata;
+    if (Py_None != user_defined_metadata) {
+        if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) {
+            PyErr_Format(
+                    PyExc_TypeError,
+                    "`%s` must be given as a `str` object.",
+                    static_cast<char const*>(keyword_user_defined_metadata)
+            );
+            return -1;
+        }
+        Py_ssize_t json_str_size{};
+        auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)};
+        if (nullptr == json_str_data) {
+            return -1;
+        }
+        auto parsed_user_defined_metadata = nlohmann::json::parse(
+                std::string_view{json_str_data, static_cast<size_t>(json_str_size)},
+                nullptr,
+                false
+        );
+        if (parsed_user_defined_metadata.is_discarded()) {
+            PyErr_Format(
+                    PyExc_RuntimeError,
+                    "Failed to parse `%s`: %s",
+                    static_cast<char const*>(keyword_user_defined_metadata),
+                    json_str_data
+            );
+            return -1;
+        }
+        optional_user_defined_metadata = std::move(parsed_user_defined_metadata);
+    }


🛠️ Refactor suggestion

Add validation to ensure parsed metadata is a JSON object.

While the JSON parsing is handled well, there's no validation to ensure the parsed result is specifically a JSON object. Other JSON types (arrays, primitives) should be rejected.

Add this validation after line 306:

if (parsed_user_defined_metadata.is_discarded()) { PyErr_Format( PyExc_RuntimeError, "Failed to parse `%s`: %s", static_cast<char const*>(keyword_user_defined_metadata), json_str_data ); return -1; } + if (!parsed_user_defined_metadata.is_object()) { + PyErr_Format( + PyExc_ValueError, + "`%s` must be a JSON object, got %s", + static_cast<char const*>(keyword_user_defined_metadata), + parsed_user_defined_metadata.type_name() + ); + return -1; + } optional_user_defined_metadata = std::move(parsed_user_defined_metadata);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::optional<nlohmann::json> optional_user_defined_metadata;

if (Py_None != user_defined_metadata) {

if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) {

PyErr_Format(

PyExc_TypeError,

"`%s` must be given as a `str` object.",

static_cast<char const*>(keyword_user_defined_metadata)

);

return -1;

}

Py_ssize_t json_str_size{};

auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)};

if (nullptr == json_str_data) {

return -1;

}

auto parsed_user_defined_metadata = nlohmann::json::parse(

std::string_view{json_str_data, static_cast<size_t>(json_str_size)},

nullptr,

false

);

if (parsed_user_defined_metadata.is_discarded()) {

PyErr_Format(

PyExc_RuntimeError,

"Failed to parse `%s`: %s",

static_cast<char const*>(keyword_user_defined_metadata),

json_str_data

);

return -1;

}

optional_user_defined_metadata = std::move(parsed_user_defined_metadata);

}

std::optional<nlohmann::json> optional_user_defined_metadata;

if (Py_None != user_defined_metadata) {

if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) {

PyErr_Format(

PyExc_TypeError,

"`%s` must be given as a `str` object.",

static_cast<char const*>(keyword_user_defined_metadata)

);

return -1;

}

Py_ssize_t json_str_size{};

auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)};

if (nullptr == json_str_data) {

return -1;

}

auto parsed_user_defined_metadata = nlohmann::json::parse(

std::string_view{json_str_data, static_cast<size_t>(json_str_size)},

nullptr,

false

);

if (parsed_user_defined_metadata.is_discarded()) {

PyErr_Format(

PyExc_RuntimeError,

"Failed to parse `%s`: %s",

static_cast<char const*>(keyword_user_defined_metadata),

json_str_data

);

return -1;

}

if (!parsed_user_defined_metadata.is_object()) {

PyErr_Format(

PyExc_ValueError,

"`%s` must be a JSON object, got %s",

static_cast<char const*>(keyword_user_defined_metadata),

parsed_user_defined_metadata.type_name()

);

return -1;

}

optional_user_defined_metadata = std::move(parsed_user_defined_metadata);

}

I agree with the rabbit if we're to accept a str type metadata.

@junhaoliao Thanks for confirming! I'll add this to our knowledge base to ensure we maintain this validation requirement for JSON metadata.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-py#133 File: src/clp_ffi_py/ir/native/PySerializer.cpp:287-317 Timestamp: 2025-01-26T04:49:24.199Z Learning: In clp-ffi-py, when accepting string input for JSON metadata, validation must be performed to ensure the parsed result is specifically a JSON object (not arrays or primitives) using the `is_object()` method from nlohmann::json.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

This type validation will be handled by ffi core API, when we use this parsed object to create a serializer.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

junhaoliao · 2025-01-26T04:14:40Z

src/clp_ffi_py/ir/native/PyDeserializer.cpp

+        ":return:\n"
+        "    - The deserialized user-defined stream-level metadata, loaded as a"
+        " dictionary.\n"
+        "    - None if user-defined stream-level metadata was not given in the deserialized"


Would there be less handling for API users if we return an empty dictionary if "user-defined stream-level metadata was not given in the deserialized IR stream"?

Though, it does seem in the Serializer we do permit users to provide a None or "{}" value. I haven't traced down to the CLP code but I think the given metadata string will be encoded as-is?

I would prefer to return None indicating that the user-defined metadata wasn't specified in the source.
It is possible to encode an empty dictionary as the metadata though.

junhaoliao · 2025-01-26T04:21:07Z

src/clp_ffi_py/ir/native/PyDeserializer.cpp

+    if (false == user_defined_metadata->is_object()) {
+        PyErr_SetString(
+                PyExc_TypeError,
+                "User-defined stream-level metadata is not a JSON object."


If the type of this JSON value is not an object, it means either the file is not created using a Serializer we provide, or something went wrong in our code, right?

Though i imagine such cases are rare, would it be more helpful to debug such case if we include the JSON value type in the error message?

Yeah, the only possible case is that someone created a stream that follows our format but doesn't enforce the use of a JSON object as the metadata. Technically we should check this in the ffi core. Let me think about it...

As improved in y-scope/clp#695, we moved this check to the ffi core. The latest commit should have this check removed (however, we need to merge #134 first).

junhaoliao · 2025-01-26T04:27:24Z

src/clp_ffi_py/ir/native/PyDeserializer.cpp

+        return nullptr;
+    }
+
+    PyObjectPtr<PyObject> py_metadata_dict{py_utils_parse_json_str(json_str)};


sorry for not being able find any previous discussions on this - we have considered the https://github.com/pybind/pybind11_json and other alternatives discussed at https://www.github.com/nlohmann/json/issues/1014 , right?

just curious, for more efficient conversions, do we have plans to make use of those or create our own bindings?

Since this isn't in any of the critical path anyways, I would probably defer it to future release

junhaoliao · 2025-01-26T04:48:17Z

src/clp_ffi_py/ir/native/PySerializer.cpp

@@ -42,6 +43,9 @@ PyDoc_STRVAR(
        ":param buffer_size_limit: The maximum amount of serialized data to buffer before flushing"
        " it to `output_stream`. Defaults to 64 KiB.\n"
        ":type buffer_size_limit: int\n"
+        ":param user_defined_metadata: JSON-encoded string containing user-defined stream-level"


I think an easier to use interface is to accept a dict type. If we're time-constraint to implement that, that's fine.

That's true. I think we can fix it with a Python API binding.

junhaoliao · 2025-01-26T04:48:48Z

src/clp_ffi_py/ir/native/PySerializer.cpp

+    std::optional<nlohmann::json> optional_user_defined_metadata;
+    if (Py_None != user_defined_metadata) {
+        if (false == static_cast<bool>(PyUnicode_CheckExact(user_defined_metadata))) {
+            PyErr_Format(
+                    PyExc_TypeError,
+                    "`%s` must be given as a `str` object.",
+                    static_cast<char const*>(keyword_user_defined_metadata)
+            );
+            return -1;
+        }
+        Py_ssize_t json_str_size{};
+        auto const* json_str_data{PyUnicode_AsUTF8AndSize(user_defined_metadata, &json_str_size)};
+        if (nullptr == json_str_data) {
+            return -1;
+        }
+        auto parsed_user_defined_metadata = nlohmann::json::parse(
+                std::string_view{json_str_data, static_cast<size_t>(json_str_size)},
+                nullptr,
+                false
+        );
+        if (parsed_user_defined_metadata.is_discarded()) {
+            PyErr_Format(
+                    PyExc_RuntimeError,
+                    "Failed to parse `%s`: %s",
+                    static_cast<char const*>(keyword_user_defined_metadata),
+                    json_str_data
+            );
+            return -1;
+        }
+        optional_user_defined_metadata = std::move(parsed_user_defined_metadata);
+    }


I agree with the rabbit if we're to accept a str type metadata.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/clp_ffi_py/Py_utils.cpp (1)

135-158: Enhance the type error message.

The error message could be more helpful by including the actual type received.

Apply this diff to improve the error message:

     if (false == static_cast<bool>(PyUnicode_Check(result))) {
         PyErr_Format(
                 PyExc_TypeError,
-                "`%s` is supposed to return a `str` object",
-                cPyFuncNameSerializeDictToJsonStr
+                "`%s` is supposed to return a `str` object, got `%s` instead",
+                cPyFuncNameSerializeDictToJsonStr,
+                Py_TYPE(result)->tp_name
         );
         return nullptr;
     }

tests/test_ir/test_serializer.py (1)

169-175: Add more test cases for user-defined metadata.

While the current test cases verify error handling for invalid types, consider adding the following test cases for better coverage:

Valid dictionary with simple values
None value (default case)
Dictionary with nested structures
Dictionary with invalid JSON values (e.g., custom objects)

Here's a suggested implementation:

     def test_invalid_user_defined_metadata(self) -> None:
         byte_buffer: BytesIO = BytesIO()
         with self.assertRaises(TypeError):
             _ = Serializer(byte_buffer, user_defined_metadata="str")  # type: ignore
         with self.assertRaises(TypeError):
             _ = Serializer(byte_buffer, user_defined_metadata=[1, "str"])  # type: ignore
+
+    def test_valid_user_defined_metadata(self) -> None:
+        byte_buffer: BytesIO = BytesIO()
+        # Test None (default)
+        serializer = Serializer(byte_buffer)
+        serializer.close()
+
+        # Test valid dictionary with simple values
+        metadata = {"key": "value", "number": 42}
+        serializer = Serializer(byte_buffer, user_defined_metadata=metadata)
+        serializer.close()
+
+        # Test dictionary with nested structures
+        metadata = {"nested": {"key": "value"}, "array": [1, 2, 3]}
+        serializer = Serializer(byte_buffer, user_defined_metadata=metadata)
+        serializer.close()
+
+        # Test dictionary with invalid JSON values
+        class CustomObject:
+            pass
+        with self.assertRaises(TypeError):
+            metadata = {"invalid": CustomObject()}
+            _ = Serializer(byte_buffer, user_defined_metadata=metadata)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f49a098 and d492645.

📒 Files selected for processing (8)

clp_ffi_py/ir/native.pyi (2 hunks)
clp_ffi_py/utils.py (1 hunks)
src/clp_ffi_py/PyObjectCast.hpp (1 hunks)
src/clp_ffi_py/Py_utils.cpp (3 hunks)
src/clp_ffi_py/Py_utils.hpp (1 hunks)
src/clp_ffi_py/ir/native/PySerializer.cpp (6 hunks)
tests/test_ir/test_serder.py (3 hunks)
tests/test_ir/test_serializer.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_ir/test_serder.py
clp_ffi_py/ir/native.pyi

🧰 Additional context used

📓 Path-based instructions (4)

src/clp_ffi_py/PyObjectCast.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/Py_utils.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/Py_utils.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PySerializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (2)

src/clp_ffi_py/Py_utils.hpp (2)

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#80
File: src/clp_ffi_py/Py_utils.cpp:80-87
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/clp_ffi_py/Py_utils.cpp`, for functions like `py_utils_serialize_dict_to_msgpack`, type checking is handled by the Python utilities, and we avoid adding redundant type checks in the C++ code.

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#83
File: src/clp_ffi_py/Py_utils.cpp:114-120
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `src/clp_ffi_py/Py_utils.cpp` file, for the `py_utils_parse_json_str` function, explicit UTF-8 validation of the input string is unnecessary because Python will throw exceptions if the input string is not valid UTF-8.

src/clp_ffi_py/Py_utils.cpp (1)

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#80
File: src/clp_ffi_py/Py_utils.cpp:80-87
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/clp_ffi_py/Py_utils.cpp`, for functions like `py_utils_serialize_dict_to_msgpack`, type checking is handled by the Python utilities, and we avoid adding redundant type checks in the C++ code.

⏰ Context from checks skipped due to timeout of 90000ms (20)

GitHub Check: Build cp311-win_amd64
GitHub Check: Build cp310-win_amd64
GitHub Check: Build cp312-manylinux_aarch64
GitHub Check: Build cp311-musllinux_x86_64
GitHub Check: Build cp311-musllinux_i686
GitHub Check: Build cp311-musllinux_aarch64
GitHub Check: Build cp311-manylinux_x86_64
GitHub Check: Build cp311-manylinux_i686
GitHub Check: Build cp311-manylinux_aarch64
GitHub Check: Build cp310-musllinux_x86_64
GitHub Check: Build cp310-musllinux_i686
GitHub Check: Build cp310-musllinux_aarch64
GitHub Check: Build cp310-manylinux_x86_64
GitHub Check: Build cp310-manylinux_i686
GitHub Check: Build cp310-manylinux_aarch64
GitHub Check: Build cp311-macosx_universal2
GitHub Check: Build cp311-macosx_arm64
GitHub Check: Build cp310-macosx_x86_64
GitHub Check: Build cp310-macosx_universal2
GitHub Check: Build cp310-macosx_arm64

🔇 Additional comments (8)

src/clp_ffi_py/Py_utils.hpp (1)

48-55: LGTM!

The function declaration is well-documented and follows the existing patterns in the codebase.

src/clp_ffi_py/PyObjectCast.hpp (1)

136-136: LGTM!

The macro specialization for PyUnicodeObject is correctly added and follows the existing pattern.

src/clp_ffi_py/Py_utils.cpp (2)

19-19: LGTM!

The constant and static pointer declarations follow the existing patterns.

Also applies to: 26-26

73-80: LGTM!

The initialization code follows the existing patterns and includes proper error handling.
src/clp_ffi_py/ir/native/PySerializer.cpp (4)

Line range hint 37-51: LGTM! Clear and comprehensive documentation.

The documentation for the new user_defined_metadata parameter is well-written and includes all necessary details about the parameter type and validation requirements.

Line range hint 228-251: LGTM! Parameter parsing is correctly implemented.

The changes properly integrate the new optional parameter into the existing argument parsing infrastructure.

331-331: LGTM! Metadata correctly passed to serializer.

The optional metadata is properly passed to the serializer's create method.

290-329: Note regarding JSON object validation.

Previous review discussion indicated that JSON object type validation will be handled by the FFI core API rather than at this layer.

Let's verify the FFI core API validation:

✅ Verification successful

JSON validation delegation to FFI core API confirmed

The code correctly delegates JSON validation to the FFI core API through clp::ffi::ir_stream::Serializer. This aligns with the previous review discussion.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for JSON object validation in FFI core API
rg -A 5 'create.*user_defined_metadata' 
Length of output: 638

Script:
#!/bin/bash
# Search for ClpIrSerializer class and create method implementation
ast-grep --pattern 'class ClpIrSerializer {
  $$$
  static $_ create($_) {
    $$$
  }
  $$$
}'
Length of output: 97

Script:
#!/bin/bash
# Find files containing ClpIrSerializer definition
rg -l "class.*ClpIrSerializer" --type cpp --type hpp
Length of output: 82

Script:
#!/bin/bash
# Search for ClpIrSerializer class and create method
rg -A 10 "ClpIrSerializer" 
Length of output: 4822

Script:
#!/bin/bash
# Search for the core Serializer implementation
rg -A 10 "namespace.*ir_stream" --type cpp --type hpp
rg -A 10 "class.*Serializer.*ir_stream" --type cpp --type hpp
Length of output: 175

coderabbitai · 2025-01-26T10:12:55Z

clp_ffi_py/utils.py

+def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str:
+    """
+    Serializes the given dictionary into a JSON string.
+
+    :param dictionary: The dictionary to serialize.
+    :return: JSON string of serialized dictionary.
+    :raises: TypeError The given input is not a dictionary.
+    """
+    if not isinstance(dictionary, dict):
+        raise TypeError("The type of the input object must be a dictionary.")
+    return json.dumps(dictionary)
+


🛠️ Refactor suggestion

Add error handling for JSON serialization failures.

The function should handle potential JSON serialization errors that could occur with complex Python objects or circular references.

Apply this diff to add error handling:

def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str: """ Serializes the given dictionary into a JSON string. :param dictionary: The dictionary to serialize. :return: JSON string of serialized dictionary. :raises: TypeError The given input is not a dictionary. + :raises: TypeError The dictionary contains values that are not JSON serializable. """ if not isinstance(dictionary, dict): raise TypeError("The type of the input object must be a dictionary.") - return json.dumps(dictionary) + try: + return json.dumps(dictionary) + except TypeError as e: + raise TypeError(f"The dictionary contains values that are not JSON serializable: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str:

"""

Serializes the given dictionary into a JSON string.

:param dictionary: The dictionary to serialize.

:return: JSON string of serialized dictionary.

:raises: TypeError The given input is not a dictionary.

"""

if not isinstance(dictionary, dict):

raise TypeError("The type of the input object must be a dictionary.")

return json.dumps(dictionary)

def serialize_dict_to_json_str(dictionary: Dict[Any, Any]) -> str:

"""

Serializes the given dictionary into a JSON string.

:param dictionary: The dictionary to serialize.

:return: JSON string of serialized dictionary.

:raises: TypeError The given input is not a dictionary.

:raises: TypeError The dictionary contains values that are not JSON serializable.

"""

if not isinstance(dictionary, dict):

raise TypeError("The type of the input object must be a dictionary.")

try:

return json.dumps(dictionary)

except TypeError as e:

raise TypeError(f"The dictionary contains values that are not JSON serializable: {e}")

Is re-throwing the same exception here making any difference?

@LinZhihao-723 You make a good point. Re-throwing the same TypeError doesn't add significant value here since json.dumps already provides a descriptive error message indicating which value couldn't be serialized. The extra try/except block would just add complexity without meaningful benefit. The original implementation letting json.dumps' TypeError propagate naturally is cleaner and sufficient.

✏️ Learnings added

Learnt from: LinZhihao-723 PR: y-scope/clp-ffi-py#133 File: clp_ffi_py/utils.py:51-62 Timestamp: 2025-01-26T10:21:10.011Z Learning: For the serialize_dict_to_json_str function in clp_ffi_py/utils.py, let json.dumps' TypeError propagate naturally as it already provides sufficient error context, rather than catching and re-throwing with a custom message.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

clp_ffi_py/utils.py

junhaoliao

the pr title is fine to be used as the final commit message

Co-authored-by: Junhao Liao <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)

183-215: LGTM! Consider future performance optimization.

The implementation is robust with proper error handling and type verification. As discussed in previous reviews, consider using pybind11_json in a future release for more efficient JSON conversions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d492645 and eb63020.

📒 Files selected for processing (3)

clp_ffi_py/utils.py (1 hunks)
src/clp (1 hunks)
src/clp_ffi_py/ir/native/PyDeserializer.cpp (5 hunks)

✅ Files skipped from review due to trivial changes (1)

src/clp

🚧 Files skipped from review as they are similar to previous changes (1)

clp_ffi_py/utils.py

🧰 Additional context used

📓 Path-based instructions (1)

src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)

src/clp_ffi_py/ir/native/PyDeserializer.cpp (3)

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.hpp:246-248
Timestamp: 2024-11-23T05:54:37.767Z
Learning: In the `src/clp_ffi_py/ir/native` module, raw pointers with `gsl::owner` are intentionally used instead of `std::unique_ptr` to comply with CPython's memory management requirements.

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:218-242
Timestamp: 2024-11-23T05:55:57.106Z
Learning: In `src/clp_ffi_py/ir/native/PyDeserializer.cpp`, the `deserialize_log_event` method in the `PyDeserializer` class ensures proper loop termination and does not risk infinite loops, even when handling incomplete or corrupted streams.

Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#94
File: src/clp_ffi_py/ir/native/PyDeserializer.cpp:206-208
Timestamp: 2024-11-23T05:55:08.019Z
Learning: In code interfacing with CPython (e.g., `src/clp_ffi_py/ir/native/PyDeserializer.cpp`), using raw pointers for memory management is acceptable due to CPython's memory management requirements.

⏰ Context from checks skipped due to timeout of 90000ms (20)

GitHub Check: Build cp311-win_amd64
GitHub Check: Build cp310-win_amd64
GitHub Check: Build cp312-manylinux_aarch64
GitHub Check: Build cp311-musllinux_x86_64
GitHub Check: Build cp311-musllinux_i686
GitHub Check: Build cp311-musllinux_aarch64
GitHub Check: Build cp311-manylinux_x86_64
GitHub Check: Build cp311-manylinux_i686
GitHub Check: Build cp311-manylinux_aarch64
GitHub Check: Build cp310-musllinux_x86_64
GitHub Check: Build cp310-musllinux_i686
GitHub Check: Build cp310-musllinux_aarch64
GitHub Check: Build cp310-manylinux_x86_64
GitHub Check: Build cp310-manylinux_i686
GitHub Check: Build cp310-manylinux_aarch64
GitHub Check: Build cp311-macosx_universal2
GitHub Check: Build cp311-macosx_arm64
GitHub Check: Build cp310-macosx_x86_64
GitHub Check: Build cp310-macosx_universal2
GitHub Check: Build cp310-macosx_arm64

🔇 Additional comments (5)

src/clp_ffi_py/ir/native/PyDeserializer.cpp (5)

6-6: LGTM! Required headers added.

The new headers are necessary for string handling, protocol constants, and JSON processing.

Also applies to: 14-14, 19-19

75-89: LGTM! Clear and comprehensive documentation.

The documentation clearly specifies the return type and behaviour, aligning with the discussed approach of returning None for missing metadata.

90-90: LGTM! Method declaration follows conventions.

The declaration is consistent with the project's Python C API binding patterns.

104-107: LGTM! Method table entry is properly configured.

The entry correctly sets up the Python binding with no arguments and links to the documentation.

346-355: LGTM! Safe and efficient implementation.

The implementation safely checks for key existence before access and returns a pointer to avoid copying, aligning with CPython's memory management requirements.

LinZhihao-723 added 3 commits January 23, 2025 15:40

Update clp version

a4f92a3

Update

72c4213

Add support for user-defined metadata serder

f49a098

LinZhihao-723 requested a review from junhaoliao January 26, 2025 02:55

coderabbitai bot reviewed Jan 26, 2025

View reviewed changes

junhaoliao reviewed Jan 26, 2025

View reviewed changes

LinZhihao-723 mentioned this pull request Jan 26, 2025

feat(ffi): Validate the type of user-defined metadata when deserializing the new IR format. y-scope/clp#695

Merged

LinZhihao-723 added 2 commits January 26, 2025 04:19

Merge branch 'oss-main' into metadata-serde

29d0394

Apply code review comments

d492645

coderabbitai bot reviewed Jan 26, 2025

View reviewed changes

LinZhihao-723 added 3 commits January 26, 2025 05:13

Update

35449be

Merge branch 'clp-update' into metadata-serde

4247452

Apply code review comments

e4344f0

LinZhihao-723 requested a review from junhaoliao January 26, 2025 10:22

junhaoliao previously approved these changes Jan 26, 2025

View reviewed changes

clp_ffi_py/utils.py Outdated Show resolved Hide resolved

junhaoliao reviewed Jan 26, 2025

View reviewed changes

Update utils.py

eb63020

Co-authored-by: Junhao Liao <[email protected]>

LinZhihao-723 dismissed junhaoliao’s stale review via eb63020 January 26, 2025 22:04

junhaoliao approved these changes Jan 26, 2025

View reviewed changes

coderabbitai bot reviewed Jan 26, 2025

View reviewed changes

LinZhihao-723 merged commit eb59339 into y-scope:main Jan 26, 2025
73 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for serializing/deserializing user-defined stream-level metadata. #133

feat: Add support for serializing/deserializing user-defined stream-level metadata. #133

LinZhihao-723 commented Jan 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 26, 2025 •

edited

Loading

junhaoliao Jan 26, 2025

coderabbitai bot Jan 26, 2025

LinZhihao-723 Jan 26, 2025

coderabbitai bot Jan 26, 2025

junhaoliao Jan 26, 2025

junhaoliao Jan 26, 2025

LinZhihao-723 Jan 26, 2025

junhaoliao Jan 26, 2025

LinZhihao-723 Jan 26, 2025

LinZhihao-723 Jan 26, 2025

junhaoliao Jan 26, 2025

LinZhihao-723 Jan 26, 2025

junhaoliao Jan 26, 2025

LinZhihao-723 Jan 26, 2025

LinZhihao-723 Jan 26, 2025

junhaoliao Jan 26, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 26, 2025 •

edited

Loading

LinZhihao-723 Jan 26, 2025

coderabbitai bot Jan 26, 2025

junhaoliao left a comment

coderabbitai bot left a comment

feat: Add support for serializing/deserializing user-defined stream-level metadata. #133

feat: Add support for serializing/deserializing user-defined stream-level metadata. #133

Conversation

LinZhihao-723 commented Jan 26, 2025 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Summary by CodeRabbit

coderabbitai bot commented Jan 26, 2025 • edited Loading

Walkthrough

Changes

Possibly Related PRs

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2025

Choose a reason for hiding this comment

junhaoliao left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

LinZhihao-723 commented Jan 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Jan 26, 2025 •

edited

Loading

coderabbitai bot Jan 26, 2025 •

edited

Loading