Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support geodesic interpolation merging of chipalign paper ? #485

Open
dlmastery opened this issue Jan 10, 2025 · 1 comment
Open

support geodesic interpolation merging of chipalign paper ? #485

dlmastery opened this issue Jan 10, 2025 · 1 comment

Comments

@dlmastery
Copy link

https://arxiv.org/abs/2412.19819

ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation
Chenhui Deng, Yunsheng Bai, Haoxing Ren
Recent advancements in large language models (LLMs) have expanded their application across various domains, including chip design, where domain-adapted chip models like ChipNeMo have emerged. However, these models often struggle with instruction alignment, a crucial capability for LLMs that involves following explicit human directives. This limitation impedes the practical application of chip LLMs, including serving as assistant chatbots for hardware design engineers. In this work, we introduce ChipAlign, a novel approach that utilizes a training-free model merging strategy, combining the strengths of a general instruction-aligned LLM with a chip-specific LLM. By considering the underlying manifold in the weight space, ChipAlign employs geodesic interpolation to effectively fuse the weights of input LLMs, producing a merged model that inherits strong instruction alignment and chip expertise from the respective instruction and chip LLMs. Our results demonstrate that ChipAlign significantly enhances instruction-following capabilities of existing chip LLMs, achieving up to a 26.6% improvement on the IFEval benchmark, while maintaining comparable expertise in the chip domain. This improvement in instruction alignment also translates to notable gains in instruction-involved QA tasks, delivering performance enhancements of 3.9% on the OpenROAD QA benchmark and 8.25% on production-level chip QA benchmarks, surpassing state-of-the-art baselines.

@dlmastery dlmastery changed the title support geodesic merging of chipalign paper ? support geodesic interpolation merging of chipalign paper ? Jan 10, 2025
@dlmastery
Copy link
Author

looks like it will be minor change to nuslerp.py

   if self.geodesic:
         # ChipAlign-style geodesic interpolation
        if base_tensor is not None:
            raise ValueError("ChipAlign-style geodesic interpolation does not support a base model.")
        if self.lambda_val is None:
            raise ValueError("lambda must be specified when geodesic=True")
        
        instruction_tensor = tensors[0]
        chip_tensor = tensors[1]

        instruction_tensor_norm = torch.norm(instruction_tensor)
        chip_tensor_norm = torch.norm(chip_tensor)
        instruction_tensor_unit = instruction_tensor / instruction_tensor_norm
        chip_tensor_unit = chip_tensor / chip_tensor_norm

        merged_tensor_unit = slerp(
            self.lambda_val, instruction_tensor_unit, chip_tensor_unit
        )

        merged_tensor = (
            (instruction_tensor_norm ** (1 - self.lambda_val))
            * (chip_tensor_norm ** self.lambda_val)
            * merged_tensor_unit
        )
        return merged_tensor

 else:  # nuslerp code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant