Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace func trie with hashmap #179

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChAoSUnItY
Copy link
Collaborator

Previously, trie implementation is not consistent, mainly because of using index to point the referencing func_t to FUNCS, additionally, it lacks of dynamic allocation which might cause segmentation fault and results more technical debt to debug on either FUNCS or FUNCS_TRIE. Thus, in this PR, we can resolve this issue by introducing a dynamic hashmap.

Current implementation is using FNV-1a hashing algorithm (32-bit edition to be precise), and due to lack of unsigned integer implementation, hashing result ranges from 0 to 2,147,483,647.

Notice that current implementation may suffer from lookup issue when the function amount keeps increasing since current hashmap implementation doesn't offer rehashing based on load factor (which ideally, 0.75 would be best and currently shecc does not support floating number).

This also enables us to refactor more structures later with hashmap implementation in shecc.

Benchmark for ./tests/hello.c compilation

Before

Command being timed: "./out/shecc tests/hello.c"
        User time (seconds): 0.00
        System time (seconds): 0.02
        Percent of CPU this job got: 76%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 52112
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 12220
        Voluntary context switches: 0
        Involuntary context switches: 0
        Swaps: 0
        File system inputs: 0
        File system outputs: 32
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

After

Command being timed: "./out/shecc tests/hello.c"
        User time (seconds): 0.00
        System time (seconds): 0.02
        Percent of CPU this job got: 71%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 49916
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 12224
        Voluntary context switches: 1
        Involuntary context switches: 0
        Swaps: 0
        File system inputs: 8
        File system outputs: 32
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@jserv
Copy link
Collaborator

jserv commented Jan 19, 2025

@visitorckw, can you comment this?

src/globals.c Outdated Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
@visitorckw
Copy link
Contributor

Looks good as is.

However, as mentioned, a large number of functions may cause excessive collisions and slow down performance. For smaller function counts, the default 512 buckets might be overkill. Therefore, a radix tree with dynamic memory allocation could still be a method worth exploring in the future.

@ChAoSUnItY
Copy link
Collaborator Author

I'm concerning that dynamic memory allocation at this moment is not reliable and potentially flawed, I've attempted to implement rehashing algorithm before, but on stage 2 the compilation will fail, while the GCC and stage 1 are fine.

src/globals.c Outdated Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
@jserv
Copy link
Collaborator

jserv commented Jan 19, 2025

Refine the message "This also enables us to refactor more structures later with hashmap implementation in shecc." as following:

This also allows for future refactoring of additional structures using a hashmap implementation.

You don't have to mention the project name unless it does impact something you are referring.

src/globals.c Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
src/globals.c Outdated Show resolved Hide resolved
@ChAoSUnItY ChAoSUnItY force-pushed the refactor/hashmap branch 4 times, most recently from 1bacbd7 to cb82f7a Compare January 19, 2025 10:27
src/globals.c Outdated Show resolved Hide resolved
Previously, trie implementation is not consistent, mainly because of
using index to point the referencing func_t to FUNCS, additionally,
trie's advantage is that enables prefix lookup, but in shecc, it hasn't
been used in this way, furthur more, it takes 512 bytes per trie node,
while in this implementation, it 24 + W (W stands for key length
including NULL character) bytes per hashmap bucket node, which
significantly reduces memory usage.

This also allows for future refactoring of additional structures using
a hashmap implementation.

for (; *key; key++) {
hash ^= *key;
hash *= 0x01000193;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiplication here may cause overflow, leading to undefined behavior. Signed integer overflow is undefined, while unsigned integer overflow is not. Since shecc currently lacks support for unsigned integers, we might consider adding it to address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants