Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static inline is bad. Generates too much code that the linker can't remove. #1621

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jordo
Copy link

@jordo jordo commented Oct 10, 2024

Looking at the dissasembly of a gdextension library, there are way too many unused symbols and code included. It's bloaty. 1.5MB library for a single gdextension class.

Everyclass seems to include things like this:

image

Engine class registrations, etc. Even when no class in my extension used the base engine classes. The culprit here is inline statics which cause a lot of bloat, and are all ultimately unused. This seems also to be the case in godot main as well.

These cause symbols from every class in godot-cpp to be included in the final link, even if completely unused by the impl lib. Reworking changes a basic gdextension shared library from being ~1.5MB to now ~200kB.

@jordo jordo requested a review from a team as a code owner October 10, 2024 15:13
@AThousandShips AThousandShips requested a review from a team October 10, 2024 16:00
…unable to optimize and remove on final link. This causes these symbols from every class in godot-cpp to be included in the final link, even if completely unused by the lib. Removing changes a basic shared library from being ~1.5MB on almost all platforms to now ~200kB.
@jordo jordo force-pushed the winterpixel-master branch from fafe088 to ddd03ee Compare October 10, 2024 16:46
@dsnopek
Copy link
Collaborator

dsnopek commented Oct 11, 2024

Thanks!

@Faless added a "build profile" feature relatively recently that can help reduce binary size (and build times) - see PR #1167

Engine class registrations, etc. Even when no class in my extension used the base engine classes. The culprit here is inline statics which cause a lot of bloat, and are all ultimately unused.

Unfortunately, we do need these registrations for any type that may be returned by a Godot API, in order to solve an important bug with godot-cpp creating the incorrect wrapper class - see PR #1050

The "build profiles" help but they work at the class level. There will be methods you don't actually call which return types that end up getting registered. It'd be really cool to somehow only generate the methods that are actually called?

Anyway, we can't just remove the engine class registration if it means bringing back that bug. If anyone has any better ideas for how to solve this problem, they would be welcome :-)

inline static ::godot::internal::EngineClassRegistration<m_class> _gde_engine_class_registration_helper; \
static ::godot::internal::EngineClassRegistration<m_class> _gde_engine_class_registration_helper; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline static here makes is so that we can declare a static variable in a header without needing to setup its storage in a compilation unit. I think removing the inline here means here means this variable won't actually be initialized (and hence the registration won't happen) because we don't ever put it in a .cpp file. We could certainly generate an addition to the corresponding .cpp file, but then the registration will always happen for all classes, and not just those that were #included.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is incorrect according to the disassembly and actually running the code... This is the offending line that causes bloat because the linker isn't able to optimize all these templated unused symbols out. The variable is initialized during static initialization when the library is loaded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static class variables (without inline) need to have their storage declared in a compilation unit (.cpp file) somewhere, otherwise they will never be initialized.

This PR is removing the inline but not adding the storage bit to any .cpp file.

inline static RecreateInstance *recreate_instance = nullptr;
static RecreateInstance *recreate_instance;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching this one away from inline static is fine, but I have another PR (#1590 - still awaiting review) that removes it as part of fixing a different bug

@Calinou Calinou added enhancement This is an enhancement on the current functionality discussion labels Oct 11, 2024
@jordo
Copy link
Author

jordo commented Oct 11, 2024

Unfortunately, we do need these registrations for any type that may be returned by a Godot API, in order to solve an important bug with godot-cpp creating the incorrect wrapper class - see PR #1050

The "build profiles" help but they work at the class level. There will be methods you don't actually call which return types that end up getting registered. It'd be really cool to somehow only generate the methods that are actually called?

Anyway, we can't just remove the engine class registration if it means bringing back that bug. If anyone has any better ideas for how to solve this problem, they would be welcome :-)

This PR doesn't remove any of the registrations... And infact godot-cpp compiles to the same .a library. It doesn't 'remove' any of the code, it just fixes the linker not being able to optimize any of that out on the FINAL link with an actual implementation if my specific gdextension library doesn't use any of those classes... In my example, the linker removes everything but 4 symbols for all these unused classes. If my library just includes a single class that extends Node3D for example, the final build dynamic library is correctly optimized by the linker.

@jordo
Copy link
Author

jordo commented Oct 11, 2024

I understand the build_profiles, but this issue also relates to a lot of code in godot as well. I've been looking at optimizing the web builds for godot main for the past week and a lot of the bloat is coming from inline static which includes multiple definitions in every translation unit / include which appears that the linker isn't able to optimize out a lot of the time. The usage in godot-cpp (here) is just the biggest bang for the buck right now because it contributes to 95% of the file size of the dyn library.

@jordo
Copy link
Author

jordo commented Oct 11, 2024

Keeping these lingering symbols around for unused classes don't even make sense, because everything else from the unused classes are optimized out. Take the AStar example I attached. I can attach the whole dissembled wasm module for you to inspect, but the rest of the class doesn't even get included in the library (not even the class constructors).

@dsnopek
Copy link
Collaborator

dsnopek commented Oct 11, 2024

This PR doesn't remove any of the registrations.

I'm pretty confident this is removing the registrations. The CI tests are failing on this PR that I would expect to fail if the registrations aren't happening.

@jordo
Copy link
Author

jordo commented Oct 11, 2024

I'll post my sample code and dissasembly so I guess you can continue to evaluate... Here's the test class, nothing special.

class Example2 : public Node3D {
	GDCLASS(Example2, Node3D);
	
	protected:
	Vector2 position2;
	static void _bind_methods() {
		ClassDB::bind_method(D_METHOD("update_position"), &Example2::update_position);
	};

	void set_position2(const Vector2 &p_pos) {
		position2 = p_pos;
	};
	Vector2 get_position2() const {
		return position2;
	};

	public:
	void update_position()  {
		printf("update pos1");
	}

	Example2() { };
	virtual ~Example2() {};
};

With this PR, my FINAL gdextension dynamic library ends up being 187KB... Note that the final link step is linking example2.o (1KB) and libgodot-cpp.a (42MB!).
Screenshot 2024-10-11 at 1 32 58 PM

So what ends up in my extention library? Lets take a look for something like AStar

wasm-objdump libgame.web.template_release.wasm -d | grep AStar

Well there's no symbols related to AStar, because my example class doesn't have anything to do with the AStar class in godot-cpp, and the linker can optimize EVERYTHING out because it can guarantee that nothing in AStar is actually referenced in my lib.

What's included relating to my class? Everything you would expect, except notably set_position2 and get_position2 are actually optimized out because again the linker can guarantee those symbols are never addressed, so it can remove them.

wasm-objdump libgame.web.template_release.wasm -d | grep Example2
 000dd8: 10 9c 80 80 80 00          |   call 28 <void godot::ClassDB::_register_class<Example2, false>(bool, bool, bool)>
000ded func[28] <void godot::ClassDB::_register_class<Example2, false>(bool, bool, bool)>:
 001301: 10 b2 80 80 80 00          |   call 50 <Example2::_bind_methods()>
001ecb func[38] <void* godot::ClassDB::_recreate_instance_func<Example2>(void*, void*)>:
001ed0 func[39] <Example2::free(void*, void*)>:
001efc func[40] <void* godot::ClassDB::_create_instance_func<Example2>(void*)>:
00205a func[41] <Example2::to_string_bind(void*, unsigned char*, void*)>:
0020b1 func[42] <Example2::notification_bind(void*, int, unsigned char)>:
0020b5 func[43] <Example2::validate_property_bind(void*, GDExtensionPropertyInfo*)>:
00216b func[44] <Example2::property_get_revert_bind(void*, void const*, void*)>:
002170 func[45] <Example2::property_can_revert_bind(void*, void const*)>:
002175 func[46] <Example2::free_property_list_bind(void*, GDExtensionPropertyInfo const*, unsigned int)>:
0021cd func[47] <Example2::get_property_list_bind(void*, unsigned int*)>:
00224e func[48] <Example2::get_bind(void*, void const*, void*)>:
002253 func[49] <Example2::set_bind(void*, void const*, void const*)>:
002259 func[50] <Example2::_bind_methods()>:
 0022cc: 10 d1 80 80 80 00          | call 81 <godot::MethodBind* godot::create_method_bind<Example2>(void (Example2::*)())>
002ec1 func[59] <Example2::_gde_binding_create_callback(void*, void*)>:
002ec6 func[60] <Example2::_gde_binding_free_callback(void*, void*, void*)>:
002ec9 func[61] <Example2::_gde_binding_reference_callback(void*, void*, unsigned char)>:
0044dc func[73] <Example2::_is_extension_class() const>:
0044e1 func[74] <Example2::_notificationv(int, bool)>:
004590 func[76] <Example2::~Example2()>:
0046eb func[80] <Example2::update_position()>:
004705 func[81] <godot::MethodBind* godot::create_method_bind<Example2>(void (Example2::*)())>:

Everything works and 'registers' properly in gdextension, etc.

Now WITHOUT this PR lets have a look at the dissasembly with the same commands:

Jordans-M3-MacBook-Pro:bin jordanschidlowsky$ wasm-objdump libgame.web.template_release.wasm -d | grep AStar2D
06b2ac func[379] <godot::internal::EngineClassRegistration<godot::AStar2D>::callback()>:
06b364 func[381] <godot::AStar2D::_estimate_cost(long long, long long) const>:
06b36c func[382] <godot::AStar2D::_compute_cost(long long, long long) const>:
06b375 func[383] <godot::AStar2D::~AStar2D()>:
06b41e func[384] <godot::AStar2D::_gde_binding_create_callback(void*, void*)>:
06b459 func[385] <godot::AStar2D::_gde_binding_free_callback(void*, void*, void*)>:
06b47c func[386] <godot::AStar2D::_gde_binding_reference_callback(void*, void*, unsigned char)>:

Now the linker has removed 95% of the AStar class, but it can't optimize out everything and has left in a few things... Namely these 7 Astar2D symbols, and this is the same for every class, 800 hundred or so of them... Note, even the AStar2D constructor is optimized out, although the AStar2D destructor is left in due to virtual functions in the class.

And so now my gdextension library is ~1.5MB.

@dsnopek
Copy link
Collaborator

dsnopek commented Oct 11, 2024

Please take a look at the test failures. They indicate that the bug that the engine class registrations were added to fix has returned. If you can come up with a way to reduce the binary size while keeping these tests passing, I would be very happy to merge that solution. :-)

Here are the test failures:

== FAILURE: In function _ready() from 'res://main.gd' on line 55
    |-> Expected 'valid' but got 'invalid'
 == FAILURE: In function _ready() from 'res://main.gd' on line 56
    |-> Expected 'valid' but got 'invalid'
 == FAILURE: In function _ready() from 'res://main.gd' on line 181
    |-> Expected 'true' but got 'false'
 == FAILURE: In function _ready() from 'res://main.gd' on line 182
    |-> Expected 'true' but got 'false'
 == FAILURE: In function _ready() from 'res://main.gd' on line 189
    |-> Expected 'true' but got 'false'
 == FAILURE: In function _ready() from 'res://main.gd' on line 215
    |-> Expected '[ GDExtension::Example <--> Instance ID:2756[49](https://github.com/godotengine/godot-cpp/actions/runs/11278425430/job/31390777972?pr=1621#step:17:50)67291 ]' but got '<Object#null>'

@jordo
Copy link
Author

jordo commented Oct 11, 2024

Can you elaborate on the bug that the engine class registrations were added to fix or link to the issue here so I can take a look.

But think about it logically... I'm trying to 'extend' the engine with my dynamic library, gdextension right? It's a native library and what's nice about that is it knows what symbols need to be resolved when it links with godot-cpp.a. If my extension library extends a single engine class, say a Node2D, and that's it (my example class), do you think it makes sense for the resulting library to be 1.5MB? Obviously not, which is why i started poking around into the issue in the first place as this is REALLY important on platforms like the web.

@jordo
Copy link
Author

jordo commented Oct 12, 2024

A quick update as I'm starting to look at a few different architectures. Perhaps, the mach-o lld linker seems to do a better job than emscriptens, and perhaps the problem is just much more prevalent on the wasm modules. Comparing the mach-o disassembly between the library built before and after this PR yields these:

Before (It DOES optimize out most of the classes ( no AStar symbols for example, so that's good ):

Screenshot 2024-10-11 at 9 57 33 PM

After:
Screenshot 2024-10-11 at 9 58 04 PM

I think inline static might be an issue for some linkers, especially for emscripten/web it seems as the issue is pervasive in godot main builds as well, so I'm going to play around with an alternative... It could perhaps be initialized when the class is registered. But please still link the above bug/issue you mentioned so it can give me some context, but from what I can gather from your test cases in CI, is the API for the example gdextension class has a bound function that takes an Ref<Image> argument, and that means that class needs it's gde_binding_* initializers? Because the ones for my class are included as you can see. I'm trying to understand the test code that's failing in CI:

var image = Image.new()
assert_equal(example.image_ref_func(image), "valid")

From what I can gather, does this mean the engine Image reference from godot main -> godot-cpp Image needs to run through a gde_binding_reference_callback before it lands at the Example's image_ref_func?

In that case, i have some ideas to make this more optimal and perhaps better for wasm modules.

@dsnopek
Copy link
Collaborator

dsnopek commented Oct 15, 2024

Can you elaborate on the bug that the engine class registrations were added to fix or link to the issue here so I can take a look.

PR #1050 which I linked to in my first comment is a good place to start!

It links to these two comments which explain the original issue which led to the addition of the engine class registrations in much detail:

I feel like I've already said this several different ways above, but I'll attempt to explain again:

  1. The reason the linker can't remove the unused classes is due to the engine class registration, which causes the classes to be "used" even if your code isn't using them
  2. We're using the inline static EngineClassRegistration<T> in order to trigger the engine class registration only for classes whose header files have been #included. The inline static isn't itself inherently bad, it's just the mechanism we're using to trigger the registrations
  3. This PR changes inline static to static but doesn't put the storage for the variable into any compilation unit which means the EngineClassRegistration<T> object is never created, and so the engine registration doesn't happen. However, if you did add the storage to a compilation unit, it would register all classes, not just those that are #included
  4. This PR makes the binaries much smaller because none of the engine class registrations are happening, but it's also failing the tests that were added to ensure the bug that the engine class registrations were added to fix has returned

@jordo
Copy link
Author

jordo commented Oct 16, 2024

I've reviewed a lot of the history in this repo related this particular issue, registering these 'engine' classes statically, and the resulting binary size ramifications. Specifically these comments here #1160 and here #1266

2. We're using the `inline static EngineClassRegistration<T>` in order to trigger the engine class registration only for classes whose header files have been `#include`d. The `inline static` isn't itself inherently bad, it's just the mechanism we're using to trigger the registrations

I'll attempt to explain but an externally visible inline function must have the same definition across all translation units; otherwise, an ODR (One Definition Rule) violation occurs, similar to regular functions. c++ also ensures that any local static variable within such a function maintains the same address across all instances. The compiler might choose to inline the function or not, depending on the translation unit. If it doesn't inline, the function's definition is generated for that TU. As a result, the function could end up being included in every TU, but during linking, the linker will select one instance and discard the others. Since all versions are required to be identical, this approach works without issue. I suppose this also depends on the linker supporting a visibility model where multiple TUs can define the symbol as externally visible without causing a 'duplicate symbol' error.

But for 'static inline,' the symbol is no longer externally visible. So the compiler inlines the function for each TU. If it doesn't, the compiler generates a separate local definition of the function for that TU. At link time, the linker can't merge all these copies since they're local to each TU, so it merges them via identical code folding.

I tested a couple different object files, and the above is indeed the case... Multiple definitions if TUs include the same header, example both Example1.cpp and Example2.cpp include the godot-cpp header image.hpp. Without reviewing through bloaty I'm sure this is likely why the individual object files are so large.

What could eventually be a red herring is that this means that any static locals within the inlined static function are now unique to each TU, rather than being shared globally and during the linking process, the linker won't merge these copies since they are local to their respective TUs. So perhaps you have to be careful there.

But most importantly, #1266 just isn't the case with a wasm module unfortunately, as seen with what I've posted here. Building a gdextension with a single class and #includeing a single godot-cpp header file includes pieces of all the engine classes in the wasm module. Not their whole implementations, but their registrations, virtual functions, and destructors as I've posted. Whether this is specific to the emscripten linker and/or limitation of the wasm-tools optimizer / code folding, I have no idea. But it is likely a result of a lot of linker support that happens with static inline. I really can't elaborate further beyond posting the actual module and disassembly, so I don't think there's much more I can explain here unfortunately. This may be a limitation of the wasm memory model or execution object format I have no idea, but the resulting wasm module is the reality.

This is leading to a wasm module that is about 10X larger than it should be, and is a limitation for realistically using gdextension for us on the web, and also on web portals that have size limitations like poki for example.

I have been looking at a lot of the other language bindings and I'm not sure if there is awareness of this issue either (the binary bloat in godot-cpp as a result of header inclusion), but I've just started looking at a few other language bindings and the resulting wasm modules are quite large for simple extensions as well.

4. This PR makes the binaries much smaller because none of the engine class registrations are happening, but it's also failing the tests that were added to ensure the bug that the engine class registrations were added to fix has returned

Yes I've seen now why you have added in these 'engine class' registrations, to statically inline their defns into a vector, then initialize all of them when the extension library is loaded via register_engine_classes... Basically all related to marshalling from engine side types->godot-cpp types, I get it.

I'll post my results from bloaty for mach-o format in a followup message here, but this is an ongoing effort to try and reduce the binary footprint in godot main as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement This is an enhancement on the current functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants