Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for serializing/deserializing auto-generated key-value pairs in the new IR format. #127

Merged
merged 6 commits into from
Jan 22, 2025

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Jan 20, 2025

Description

This PR updates to the latest CLP core commit to adapt the feature enhancement introduced in this PR. From this PR, we have the following supports:

  • Add support for serializing auto-generated kv pairs using Serializer's API
  • Update Deserializer's implementation to adapt the latest core ffi API changes
  • Add support for retrieving both auto-generated and user-generated kv pairs from KeyValuePairLogEvent's API
  • Add support for creating a KeyValuePairLogEvent object from auto-gen and user-gen Python dictionaries

To test the new features, this PR:

  • Remain the old serder tests by serializing empty auto-generated kv pairs (testing for backward compatibility)
  • Improve the serder tests to serialize a nested Python dictionary as auto-generated kv pairs (testing for new features)

Validation performed

  • Ensure all workflows passed.
  • Ensure the doc string site can be successfully rendered with the expected content.
  • Run unit tests with a debug version of Python with address sanitizer, ensuring that there's no illegal memory access or memory leaks caused by any code inside the ffi library.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced log event handling with separate auto-generated and user-generated key-value pairs.
    • Introduced a new function for unpacking msgpack maps, improving msgpack processing capabilities.
  • Bug Fixes

    • Updated msgpack map processing to handle different types of key-value pairs more robustly.
  • Refactor

    • Restructured internal methods to support more granular log event management.
    • Improved error handling and type checking for serialization and deserialization.
  • Tests

    • Updated tests to validate the new structure of log event serialization and deserialization.

These updates provide more flexible and precise log event processing, allowing for better separation of automatically generated and user-generated metadata.

Copy link

coderabbitai bot commented Jan 20, 2025

Walkthrough

This pull request introduces comprehensive changes to the log event processing system, focusing on separating auto-generated and user-generated key-value pairs. The modifications span multiple files across the project, including native interface definitions, serialization and deserialization methods, and test cases. The primary enhancement is the ability to handle two distinct types of key-value pairs: auto-generated and user-generated, which requires updates to method signatures, return types, and internal logic in various components.

Changes

File Change Summary
clp_ffi_py/ir/native.pyi Updated KeyValuePairLogEvent and Serializer classes to support separate auto-generated and user-generated key-value pairs.
src/clp_ffi_py/ir/native/PyDeserializer.cpp/.hpp Modified lambda function and type alias to include a boolean parameter for auto-generated node identification.
src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp/.hpp Refactored constructor and to_dict method to handle two separate dictionaries.
src/clp_ffi_py/ir/native/PySerializer.cpp/.hpp Updated serialize_log_event_from_msgpack_map to accept two msgpack maps.
src/clp_ffi_py/utils.cpp/.hpp Added unpack_msgpack_map utility function.
tests/test_ir/* Updated test cases to support new two-dictionary structure.

Sequence Diagram

sequenceDiagram
    participant Client
    participant KeyValuePairLogEvent
    participant Serializer
    participant Deserializer

    Client->>KeyValuePairLogEvent: Create with auto_gen and user_gen dicts
    KeyValuePairLogEvent-->>Client: Log Event Object
    Client->>Serializer: Serialize with separate msgpack maps
    Serializer-->>Client: Serialized Data
    Client->>Deserializer: Deserialize with separate msgpack maps
    Deserializer-->>Client: Deserialized Log Event
Loading

Possibly related PRs

Suggested Reviewers

  • haiqi96

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35c9fc4 and 36e4cdb.

📒 Files selected for processing (1)
  • src/wrapped_facade_headers/Python.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/wrapped_facade_headers/Python.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: Build cp310-win_amd64
  • GitHub Check: Build cp310-musllinux_x86_64
  • GitHub Check: Build cp310-musllinux_i686
  • GitHub Check: Build cp310-musllinux_aarch64
  • GitHub Check: Build cp310-manylinux_x86_64
  • GitHub Check: Build cp310-manylinux_i686
  • GitHub Check: Build cp310-manylinux_aarch64
  • GitHub Check: Build cp310-macosx_universal2
  • GitHub Check: Build cp310-macosx_arm64
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (1)
src/wrapped_facade_headers/Python.hpp (1)

37-37: LGTM! The header addition is well-placed and necessary.

The addition of <tupleobject.h> is properly placed within the IWYU pragma block, maintains alphabetical ordering, and supports the new tuple handling requirements for msgpack operations.

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@LinZhihao-723 LinZhihao-723 requested a review from haiqi96 January 20, 2025 06:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (2)

Line range hint 324-484: Refactor to eliminate code duplication in dictionary handling

The function convert_py_dict_to_key_value_pair_log_event contains similar blocks for serializing and unpacking both auto_gen_kv_pairs_dict and user_gen_kv_pairs_dict. Refactoring the common logic into helper functions would enhance maintainability and readability.


735-789: Consider refactoring to_dict method to reduce repetition

The to_dict method processes auto-generated and user-generated key-value pairs with similar logic. Abstracting the shared functionality into a helper function could reduce code duplication and improve maintainability.

src/clp_ffi_py/ir/native/PySerializer.hpp (1)

97-105: LGTM! Consider enhancing the documentation.

The separation of auto-generated and user-generated key-value pairs is well-implemented. The parameter names are clear and descriptive.

Consider adding more details to the documentation about:

  • The expected format of the msgpack maps
  • Whether either map can be empty
  • Any size limitations or constraints
tests/test_ir/test_serder.py (2)

43-58: Enhance the method documentation.

While the implementation is clean and correct, the docstring could be more detailed.

Consider adding:

  • Example of the expected input format
  • Explanation of what constitutes auto-generated vs user-generated data
  • Any constraints on the dictionary contents

129-140: Consider making tests more deterministic.

While the test coverage is good, the use of current timestamp makes the tests time-dependent.

Consider using a fixed timestamp for testing to ensure consistent and reproducible results.

tests/test_ir/test_serializer.py (1)

77-80: Enhance test coverage with different data.

Using the same msgpack sequence for both auto_gen and user_gen maps might miss edge cases.

Consider adding test cases where auto_gen and user_gen maps contain different data to ensure proper handling of distinct content in each map.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2595c70 and adac44f.

📒 Files selected for processing (13)
  • clp_ffi_py/ir/native.pyi (3 hunks)
  • src/clp (1 hunks)
  • src/clp_ffi_py/ir/native/PyDeserializer.cpp (1 hunks)
  • src/clp_ffi_py/ir/native/PyDeserializer.hpp (2 hunks)
  • src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (8 hunks)
  • src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.hpp (1 hunks)
  • src/clp_ffi_py/ir/native/PySerializer.cpp (4 hunks)
  • src/clp_ffi_py/ir/native/PySerializer.hpp (1 hunks)
  • src/clp_ffi_py/utils.cpp (2 hunks)
  • src/clp_ffi_py/utils.hpp (2 hunks)
  • tests/test_ir/test_key_value_pair_log_event.py (1 hunks)
  • tests/test_ir/test_serder.py (5 hunks)
  • tests/test_ir/test_serializer.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/clp
🧰 Additional context used
📓 Path-based instructions (8)
src/clp_ffi_py/utils.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PyDeserializer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PySerializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/utils.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PySerializer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp-ffi-py#81
File: src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.hpp:42-44
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the C++ codebase, for consistency with other classes, methods like `get_kv_pair_log_event()` in `src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.hpp` should return pointers instead of references.
🪛 cppcheck (2.10-2)
src/clp_ffi_py/utils.cpp

[performance] 96-96: Using std

(returnStdMoveLocal)

⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: Build cp312-win_amd64
  • GitHub Check: Build cp313-musllinux_aarch64
  • GitHub Check: Build cp313-manylinux_x86_64
  • GitHub Check: Build cp313-manylinux_i686
  • GitHub Check: Build cp313-manylinux_aarch64
  • GitHub Check: Build cp312-musllinux_x86_64
  • GitHub Check: Build cp312-musllinux_aarch64
  • GitHub Check: Build cp312-manylinux_aarch64
  • GitHub Check: Build cp311-musllinux_aarch64
  • GitHub Check: Build cp311-manylinux_aarch64
  • GitHub Check: Build cp310-musllinux_aarch64
  • GitHub Check: Build cp310-manylinux_aarch64
  • GitHub Check: Build cp313-macosx_universal2
  • GitHub Check: Build cp313-macosx_arm64
  • GitHub Check: Build cp312-macosx_x86_64
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (22)
src/clp_ffi_py/ir/native/PyDeserializer.hpp (2)

124-127: LGTM! Well-structured type alias update.

The addition of the is_auto_generated parameter to SchemaTreeNodeInsertionHandle effectively supports the distinction between auto-generated and user-generated key-value pairs, which is a core requirement of this PR.


166-169: LGTM! Consistent implementation of the handler method.

The method correctly forwards both the is_auto_generated flag and the schema_tree_node_locator to the handler, maintaining consistency with the type alias update.

src/clp_ffi_py/ir/native/PyDeserializer.cpp (1)

195-196: LGTM! Consistent lambda implementation.

The trivial handler correctly implements the updated interface, properly marking unused parameters and maintaining the expected behaviour.

src/clp_ffi_py/ir/native/PySerializer.cpp (4)

55-70: Documentation updated to reflect new parameters

The docstring for serialize_log_event_from_msgpack_map accurately describes the new parameters auto_gen_msgpack_map and user_gen_msgpack_map, including their types and the exceptions that may be raised.


72-76: Method signature now accepts keyword arguments

The function PySerializer_serialize_log_event_from_msgpack_map has been correctly updated to accept args and keywords, allowing for keyword arguments in function calls.


297-325: Arguments parsed appropriately for new parameters

The implementation correctly parses the two new byte string parameters using PyArg_ParseTupleAndKeywords, ensuring that both auto_gen_msgpack_map and user_gen_msgpack_map are obtained properly.


465-490: Serialization logic handles separate msgpack maps correctly

The serialize_log_event_from_msgpack_map method appropriately unpacks and serializes the separate auto-generated and user-generated msgpack maps, ensuring that both are processed correctly.

src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (2)

Line range hint 69-75: Handler updated to accommodate additional parameter

The inclusion of is_auto_generated in handle_schema_tree_node_insertion correctly updates the interface to distinguish between auto-generated and user-generated schema nodes.


243-254: Constructor documentation reflects new parameters

The docstring for PyKeyValuePairLogEvent now accurately details the auto_gen_kv_pairs and user_gen_kv_pairs parameters, including their expected types and usage.

tests/test_ir/test_key_value_pair_log_event.py (1)

24-39: Test updated to validate new parameters

The test_basic method now effectively tests the KeyValuePairLogEvent initialization with both auto_gen_kv_pairs and user_gen_kv_pairs, and verifies that the to_dict method returns the expected dictionaries, enhancing test coverage for the new functionality.

src/clp_ffi_py/utils.cpp (2)

5-9: LGTM! Headers are appropriately included.

The addition of <optional> and <utility> headers is necessary to support the new unpack_msgpack_map function.


82-97: Well-implemented function with proper error handling!

The implementation includes:

  • Comprehensive error handling with appropriate Python exceptions
  • Proper validation of the msgpack type
  • Efficient return of the object handle using move semantics
🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 96-96: Using std

(returnStdMoveLocal)

src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.hpp (1)

88-95: Documentation and return type changes are well-aligned!

The changes properly reflect the new structure returning auto-generated and user-generated key-value pairs as separate dictionaries within a tuple. The return type change from PyDictObject* to PyObject* is appropriate for the new tuple return type.

clp_ffi_py/ir/native.pyi (2)

89-90: LGTM! Type hints are consistent with the C++ implementation.

The separation of auto-generated and user-generated key-value pairs in both the constructor and return type is properly reflected in the type hints.


101-103: LGTM! Method signature properly updated.

The separation of auto-generated and user-generated msgpack maps is correctly reflected in the method signature.

src/clp_ffi_py/utils.hpp (2)

9-9: LGTM! Required header properly included.

The addition of <optional> header is necessary for the new unpack_msgpack_map function's return type.


82-91: LGTM! Function declaration is well-documented and properly structured.

The declaration includes:

  • Comprehensive documentation of parameters and return values
  • Appropriate use of [[nodiscard]] attribute
  • Clear error handling documentation
tests/test_ir/test_serder.py (2)

2-2: LGTM! Type imports are properly updated.

The addition of Tuple to the imports supports the new tuple-based structure for key-value pairs.


88-96: LGTM! Well-structured deserialization logic.

The implementation correctly handles the unpacking of auto-generated and user-generated dictionaries, with proper type assertions.

tests/test_ir/test_serializer.py (3)

69-69: LGTM! Clear variable declaration.

The msgpack_byte_sequence variable is well-placed and properly typed.


94-97: Same concern about test data variety applies here.


122-125: Same concern about test data variety applies here.

@@ -86,8 +86,8 @@ class FourByteDeserializer:
) -> Optional[LogEvent]: ...

class KeyValuePairLogEvent:
def __init__(self, dictionary: Dict[Any, Any]): ...
def to_dict(self) -> Dict[Any, Any]: ...
def __init__(self, auto_gen_kv_pairs: Dict[Any, Any], user_gen_kv_pairs: Dict[Any, Any]): ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we swap the order of user_gen_kv_pairs & auto_gen_kv_pairs and make auto_gen_kv_pairs optional? That way we can achieve backward compatibility.

(That said, I can see any generic developers might not need to call this constructor.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is mainly for testing purpose; we usually construct KeyValuePairLogEvent directly from the deserializer in CPython level.
But I think your comment might make sense in Serializer's API changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought a little more. I feel like we shouldn't allow optional input even in Serializer. As this is the low level API, both values should be explicitly given. Also, most users won't use this API directly anyways. They should use Python logging library to log their kv pairs, which will be handled as user-generated by the loglib's code.

def __init__(self, dictionary: Dict[Any, Any]): ...
def to_dict(self) -> Dict[Any, Any]: ...
def __init__(self, auto_gen_kv_pairs: Dict[Any, Any], user_gen_kv_pairs: Dict[Any, Any]): ...
def to_dict(self) -> Tuple[Dict[Any, Any], Dict[Any, Any]]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That whether kv-pairs are auto-generated might not matter to all developers. Can we also provide a helper to return a single dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, my understanding is this should be the low level API, so we should return whatever given on the serialization end.
If you are suggesting merging two dicts into one, I think it would also be problematic because:

  • We can't simply merge them into one since there might be key conflicts in two dicts
  • If we add keys on the top level, like {"auto_gen": auto_gen_dict, "user_gen": user_gen_dict}, it would be a little confusing since the returned dictionaries aren't exactly what user inputs are and we need more documents to clarify things. It might be better just to return two dictionaries separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, i was proposing that we can simply merge the two dicts into one. That there can be potential key conflicts, though rare, is valid concern.

If we are to keep returning a tuple, having the user-kv-pairs as the first element might be more flexible to developers. e.g., they can write

kv_pairs, = log_event.to_dict()

to only get the user inserted pairs, if they don't care about the auto generated ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, the current ordering strictly follows the underlying ffi core APIs.
If really necessary, we can add APIs like get_user_gen_kv_pairs_as_dict and get_auto_gen_kv_pairs_as_dict in future PRs?

* @return nullptr on failure with the relevant Python exception and error set.
*/
[[nodiscard]] auto to_dict() -> PyDictObject*;
[[nodiscard]] auto to_dict() -> PyObject*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, any reason we don't return PyTupleObject* instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Py_BuildValue to create the tuple, which returns PyObject*.
Reference: https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple APIs would also return PyObject* directly, like this one (which is equivalent to Py_BuildValue): https://docs.python.org/3/c-api/tuple.html#c.PyTuple_Pack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Right, I read from the https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue docs that the API is not guaranteed to return a tuple, but since we have the format string fixed, it should not return anything else i assume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swith to PyTuple_Pack since it doesn't need argument parsing.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the implementation and docs and didn't find anything too odd. I left some suggestions regarding the Python interfaces. Feel free to object.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/clp_ffi_py/utils.cpp (1)

82-82: Consider adding a docstring to document the function's behaviour.

Adding a docstring would improve maintainability by documenting:

  • Purpose of the function
  • Parameter description
  • Return value semantics
  • Error conditions

Here's a suggested docstring:

/**
 * Unpacks a msgpack byte sequence and verifies it contains a map.
 * @param msgpack_byte_sequence Span containing the msgpack data
 * @return The unpacked object handle if successful
 * @return std::nullopt if an error occurs (sets Python exception)
 */
src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (1)

Line range hint 431-484: Consider optimizing the conversion process.

While the implementation is correct and safe, the current approach of serializing to msgpack and then deserializing could be optimized. As noted in the TODO comment, a direct conversion utility would be more efficient.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adac44f and 2785253.

📒 Files selected for processing (3)
  • src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (8 hunks)
  • src/clp_ffi_py/utils.cpp (2 hunks)
  • src/clp_ffi_py/utils.hpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_py/utils.hpp
🧰 Additional context used
📓 Path-based instructions (2)
src/clp_ffi_py/utils.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)
src/clp_ffi_py/utils.cpp

[performance] 96-96: Using std

(returnStdMoveLocal)

⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: Build cp311-win_amd64
  • GitHub Check: Build cp310-win_amd64
  • GitHub Check: Build cp311-musllinux_x86_64
  • GitHub Check: Build cp311-musllinux_i686
  • GitHub Check: Build cp311-musllinux_aarch64
  • GitHub Check: Build cp311-manylinux_x86_64
  • GitHub Check: Build cp311-manylinux_i686
  • GitHub Check: Build cp311-manylinux_aarch64
  • GitHub Check: Build cp310-musllinux_x86_64
  • GitHub Check: Build cp310-musllinux_i686
  • GitHub Check: Build cp310-musllinux_aarch64
  • GitHub Check: Build cp310-manylinux_x86_64
  • GitHub Check: Build cp310-manylinux_i686
  • GitHub Check: Build cp310-manylinux_aarch64
  • GitHub Check: Build cp311-macosx_arm64
  • GitHub Check: Build cp310-macosx_x86_64
  • GitHub Check: Build cp310-macosx_universal2
  • GitHub Check: Build cp310-macosx_arm64
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (6)
src/clp_ffi_py/utils.cpp (2)

5-9: LGTM! Headers are properly organized.

The new headers are correctly placed and necessary for the added functionality.


82-97: LGTM! Implementation is correct and handles errors appropriately.

The function properly handles error cases and sets appropriate Python exceptions. The use of std::move is correct here as msgpack::object_handle is not trivially copyable.

🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 96-96: Using std

(returnStdMoveLocal)

src/clp_ffi_py/ir/native/PyKeyValuePairLogEvent.cpp (4)

69-69: LGTM!

The addition of the is_auto_generated parameter is properly marked as unused and maintains the simple success-returning implementation.


243-254: LGTM! Documentation is clear and follows conventions.

The docstrings are well-written, properly documenting:

  • The new constructor parameters with their types
  • The updated return type of to_dict method

Also applies to: 268-272


375-414: LGTM! Constructor implementation is robust and safe.

The implementation:

  • Properly handles keyword arguments
  • Includes thorough type checking
  • Provides clear error messages
  • Ensures memory safety with proper initialization

735-789: LGTM! Dictionary serialization is well-implemented.

The implementation:

  • Cleanly separates auto-generated and user-generated key-value pairs
  • Includes comprehensive error handling with detailed messages
  • Uses RAII patterns for safe memory management
  • Has a clear and logical structure

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/clp_ffi_py/ir/native/PySerializer.cpp (3)

55-70: Consider enhancing the documentation with examples.

While the documentation clearly describes the parameters, it would be more helpful to include examples of the expected format for both auto_gen_msgpack_map and user_gen_msgpack_map to guide users in preparing the correct input.


473-479: Enhance error messages to distinguish between map types.

When unpacking fails, the error message doesn't indicate which map (auto-generated or user-generated) caused the failure. Consider customizing the error messages for better debugging.


484-491: Be more specific with NOLINT directives.

The NOLINT directives could be more specific about which exact rules are being suppressed. Consider using specific rule identifiers instead of the broad category.

Replace:

-// NOLINTBEGIN(cppcoreguidelines-pro-type-union-access)
+// NOLINTBEGIN(cppcoreguidelines-pro-type-union-access, cppcoreguidelines-pro-type-reinterpret-cast)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2785253 and 35c9fc4.

📒 Files selected for processing (1)
  • src/clp_ffi_py/ir/native/PySerializer.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_py/ir/native/PySerializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: Build cp311-win_amd64
  • GitHub Check: Build cp310-win_amd64
  • GitHub Check: Build cp311-musllinux_x86_64
  • GitHub Check: Build cp311-musllinux_i686
  • GitHub Check: Build cp311-musllinux_aarch64
  • GitHub Check: Build cp311-manylinux_x86_64
  • GitHub Check: Build cp311-manylinux_i686
  • GitHub Check: Build cp311-manylinux_aarch64
  • GitHub Check: Build cp310-musllinux_x86_64
  • GitHub Check: Build cp310-musllinux_i686
  • GitHub Check: Build cp310-musllinux_aarch64
  • GitHub Check: Build cp310-manylinux_x86_64
  • GitHub Check: Build cp310-manylinux_i686
  • GitHub Check: Build cp310-manylinux_aarch64
  • GitHub Check: Build cp311-macosx_arm64
  • GitHub Check: Build cp310-macosx_x86_64
  • GitHub Check: Build cp310-macosx_universal2
  • GitHub Check: Build cp310-macosx_arm64
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (3)
src/clp_ffi_py/ir/native/PySerializer.cpp (3)

72-76: LGTM! Good use of keyword arguments.

The change to support keyword arguments makes the API more Pythonic and improves usability.


161-161: LGTM! Correct method table update.

The addition of METH_KEYWORDS flag is necessary and correctly implemented to support keyword arguments.


297-325: Consider adding size validation for input maps.

While the parameter parsing is correct, consider adding validation for the size of input maps to prevent potential memory issues with extremely large inputs. You could use m_buffer_size_limit as a reference for maximum allowed size.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python interfaces lgtm. Code logic matches their descriptions. Preferably someone with more C++ experience should also take a look. Otherwise all is good.

@@ -86,8 +86,8 @@ class FourByteDeserializer:
) -> Optional[LogEvent]: ...

class KeyValuePairLogEvent:
Copy link
Member

@junhaoliao junhaoliao Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure how we are going to implement the compatibility layer for CLP Text IR Streams. In the future, will log events also be accessed through this interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, timestamp will be stored inside the auto-generated dictionary and msg will be stored inside the user-generated dictionary.

@LinZhihao-723 LinZhihao-723 merged commit 034dfa2 into y-scope:main Jan 22, 2025
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants