-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'instructlab:main' into feature/add-try-catch-import-to-…
…deepspeed
- Loading branch information
Showing
10 changed files
with
423 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
# Changelog | ||
|
||
## v0.5.5 | ||
|
||
### v0.5.5 Features | ||
|
||
* e2e: replace old small job with new medium job | ||
|
||
### v0.5.5 Fixes | ||
|
||
* fix: incorrect label for AWS medium runner | ||
* chore: add exit code & tox fix | ||
|
||
### v0.5.5 Infrastructure | ||
|
||
* ci: grant HF_TOKEN access to the medium-size E2E CI job | ||
|
||
## v0.5.4 | ||
|
||
### v0.5.4 Features | ||
|
||
* Add rocm extra to pyproject.toml | ||
|
||
## v0.5.3 | ||
|
||
### v0.5.3 Fixes | ||
|
||
* fix: Add explicit flash_attn requirement for ROCm | ||
|
||
## v0.5.2 - Fix Pretraining Masking | ||
|
||
### v0.5.2 Fixes | ||
|
||
* fix: improve linting and automation | ||
* Fix pretrain token list->int for masking | ||
|
||
## v0.5.1 | ||
|
||
### v0.5.1 Fixes | ||
|
||
* fix: updates sorting logic to correctly compare numbers | ||
|
||
## v0.5.0 - FSDP and Full-State Checkpoint Resuming | ||
|
||
### v0.5.0 Features | ||
|
||
* feat: add e2e test for instructlab CI | ||
* feat: add mergify | ||
* Adding FSDP Support to Training Library by @aldopareja @Maxusmusti @RobotSail | ||
* adds Accelerate full-state (opt, lr_sched, params) | ||
* changes StreamablePopen to return a process and implement listening | ||
|
||
### v0.5.0 Fixes | ||
|
||
* Fix lint error to make CI happy | ||
* Fix typos | ||
* Ap/fix multipack for non granite models | ||
* Fix generic chat template saved to tokenizer for generation | ||
* Fix linting error and missing quote | ||
|
||
### v0.5.0 Infrastructure | ||
|
||
* Add license identifiers | ||
* ci: update runner labels to uniquely identify instance sizes | ||
* ci: minor cleanup of E2E job | ||
* Fixing e2e to use relative path for working-directory | ||
* switch -T to -a | ||
* github: add stale bot to training repo | ||
* fix: markdown lint error and mergify bug | ||
* Bump actions/checkout from 4.1.7 to 4.2.0 | ||
* Bump step-security/harden-runner from 2.8.1 to 2.9.1 | ||
* Bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.2 | ||
* Bump actions/setup-python from 5.1.0 to 5.2.0 | ||
* Bump rhysd/actionlint from 1.7.1 to 1.7.2 | ||
* Bump hynek/build-and-inspect-python-package from 2.6.0 to 2.9.0 | ||
* Bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 | ||
* ci: fix lint action | ||
* ci: add AWS tags to show github ref and PR num for all jobs | ||
|
||
## v0.5.0 Alpha 0 - The FSDP Release Pre-release | ||
|
||
### v0.5.0 Alpha Description | ||
|
||
The FSDP Release introduces FSDP support in addition to the existing DeepSpeed support through the accelerate library. | ||
|
||
### v0.5.0 Alpha Features | ||
|
||
* feat: add e2e test for instructlab CI | ||
* feat: add mergify | ||
* Adding FSDP Support to Training Library by @aldopareja @Maxusmusti @RobotSail | ||
|
||
### v0.5.0 Alpha Fixes | ||
|
||
* Fix lint error to make CI happy | ||
* Fix typos | ||
* Ap/fix multipack for non granite models | ||
* Fix linting error and missing quote | ||
|
||
### v0.5.0 Alpha Infrastructure | ||
|
||
* Add license identifiers | ||
* ci: update runner labels to uniquely identify instance sizes | ||
* ci: minor cleanup of E2E job | ||
* Fixing e2e to use relative path for working-directory | ||
* Bump step-security/harden-runner from 2.8.1 to 2.9.1 | ||
* Bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.2 | ||
* Bump actions/setup-python from 5.1.0 to 5.2.0 | ||
* Bump rhysd/actionlint from 1.7.1 to 1.7.2 | ||
* Bump hynek/build-and-inspect-python-package from 2.6.0 to 2.9.0 | ||
* Bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 | ||
* ci: fix lint action | ||
* ci: add AWS tags to show github ref and PR num for all jobs | ||
|
||
## v0.4.2 | ||
|
||
### v0.4.2 Features | ||
|
||
* Provide safeguards during training | ||
|
||
## v0.4.1 | ||
|
||
### v0.4.1 Changes | ||
|
||
* makes saving every save_samples an optional feature | ||
|
||
## v0.4.0 | ||
|
||
### v0.4.0 Features | ||
|
||
* Adds a flag to save checkpoints at the end of an epoch | ||
|
||
### v0.4.0 Changes | ||
|
||
* Change success message at end of training | ||
|
||
## v0.3.2 | ||
|
||
### v0.3.2 Features | ||
|
||
* Accept tuples for lora.target_modules | ||
|
||
### v0.3.2 Documentation | ||
|
||
* patch some hyper parameter arg descriptions in README | ||
|
||
## v0.3.1 | ||
|
||
### v0.3.1 Dependencies | ||
|
||
* Update requirements to have bitsandbytes min and dolomite min | ||
|
||
## v0.3.0 | ||
|
||
### v0.3.0 Features | ||
|
||
* Updating token masking to support pretraining w/ masked special tokens | ||
* Adding weight merging for LoRA/QLoRA ckpts | ||
|
||
### v0.3.0 Fixes | ||
|
||
* remove dead code | ||
* fix: changes the check to check against both the enum option and enum value | ||
|
||
## v0.2.0 | ||
|
||
### v0.2.0 Features | ||
|
||
* Fix ckpt save to include architecture for inference runtime consumption | ||
* Logging updates | ||
|
||
### v0.2.0 Performance | ||
|
||
* Reducing deepspeed timeout to 10mins | ||
|
||
## v0.1.0 | ||
|
||
### v0.1.0 Features | ||
|
||
* Flash Attention Disable Toggle (Take 2) | ||
|
||
### v0.1.0 Performance | ||
|
||
* Reduce Unnecessary Multiprocessing | ||
|
||
### v0.1.0 Fixes | ||
|
||
* 🐛: fix optimizer selection logic so that FusedAdam is never loaded when CPU offloading is enabled | ||
* Add wheel to requirements | ||
|
||
## v0.0.5.1 | ||
|
||
### v0.0.5.1 Fixes | ||
|
||
This release includes PR [#121](https://github.com/instructlab/training/pull/121) to overcome an issue where our way of lazily importing the run_training function is being picked up as an error by pylint. | ||
|
||
## v0.0.5 | ||
|
||
Minor bugfixes and updates. | ||
|
||
## v0.0.4 | ||
|
||
Minor bugfixes and updates. | ||
|
||
## v0.0.3 | ||
|
||
Minor bugfixes and updates. | ||
|
||
## v0.0.2 | ||
|
||
### Features | ||
|
||
This introduces the instructlab library as a package in the instructlab package namespace. | ||
|
||
To install it: | ||
|
||
```bash | ||
pip install instructlab-training | ||
``` | ||
|
||
And to install it with flash-attn and other CUDA-dependent packages, you can use | ||
|
||
```bash | ||
pip install instructlab-training[cuda] | ||
``` | ||
|
||
Here's how to use it: | ||
|
||
```python | ||
from instructlab.training.config import TorchrunArgs, TrainingArgs, run_training | ||
|
||
torchrun_args = TorchrunArgs( | ||
nproc_per_node = 1, # 1 GPU | ||
nnodes = 1, # only 1 overall machine in the system | ||
node_rank = 0, # rank of the current machine | ||
rdzv_id = 123, # what ID other nodes will join on | ||
rdzv_endpoint = '0.0.0.0:12345' # address where other nodes will join | ||
) | ||
|
||
training_args = TrainingArgs( | ||
# specify training args here | ||
) | ||
|
||
run_training(torch_args = torchrun_args, train_args = training_args) | ||
``` | ||
|
||
## v0.0.1 | ||
|
||
### v0.0.1 Features | ||
|
||
Initial release with same features as v0.0.2. |
42 changes: 28 additions & 14 deletions
42
src/instructlab/training/chat_templates/ibm_generic_tmpl.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,44 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
# First Party | ||
from instructlab.training.tokenizer_utils import SpecialTokens, TokenInfo | ||
from instructlab.training.chat_templates.utils import SpecialTokens, TokenInfo | ||
|
||
SPECIAL_TOKENS = SpecialTokens( | ||
system=TokenInfo("<|system|>", add_to_tokenizer=True), | ||
user=TokenInfo("<|user|>", add_to_tokenizer=True), | ||
assistant=TokenInfo("<|assistant|>", add_to_tokenizer=True), | ||
eos=TokenInfo("<|endoftext|>", add_to_tokenizer=True), | ||
pad=TokenInfo("<|pad|>", add_to_tokenizer=True), | ||
bos=TokenInfo("<|begginingoftext|>", add_to_tokenizer=True), | ||
start_role=TokenInfo("<|start_of_role|>", add_to_tokenizer=True), | ||
end_role=TokenInfo("<|end_of_role|>", add_to_tokenizer=True), | ||
tool=TokenInfo("<|tool_call|>", add_to_tokenizer=True), | ||
eos=TokenInfo("<|end_of_text|>", add_to_tokenizer=True), | ||
bos=TokenInfo("<|end_of_text|>", add_to_tokenizer=True), | ||
pad=TokenInfo("<|end_of_text|>", add_to_tokenizer=True), | ||
) | ||
|
||
CHAT_TEMPLATE = ( | ||
"{%- if tools %}" | ||
"{{ '<|start_of_role|>available_tools<|end_of_role|>\n' }}" | ||
"{% for tool in tools %}" | ||
"{{ tool | tojson(indent=4) }}" | ||
"{% if not loop.last %}" | ||
"{{- '\n\n' }}" | ||
"{% endif %}" | ||
"{% endfor %}" | ||
"{{ '<|end_of_text|>\n' }}" | ||
"{% endif %}" | ||
"{% for message in messages %}" | ||
"{% if message['role'] == 'pretraining' %}" | ||
"{{'<|pretrain|>' + message['content'] + '<|endoftext|>' + '<|/pretrain|>' }}" | ||
"{% elif message['role'] == 'system' %}" | ||
"{{'<|system|>'+ '\n' + message['content'] + '\n'}}" | ||
"{% if message['role'] == 'system' %}" | ||
"{{ '<|start_of_role|>system<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}" | ||
"{% elif message['role'] == 'pretraining' %}" | ||
"{{ '<|pretrain|>' + message['content'] + '<|end_of_text|>' + '<|/pretrain|>'}}" | ||
"{% elif message['role'] == 'user' %}" | ||
"{{'<|user|>' + '\n' + message['content'] + '\n'}}" | ||
"{{ '<|start_of_role|>user<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}" | ||
"{% elif message['role'] == 'assistant' %}" | ||
"{{'<|assistant|>' + '\n' + message['content'] + '<|endoftext|>' + ('' if loop.last else '\n')}}" | ||
"{{ '<|start_of_role|>assistant<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}" | ||
"{% elif message['role'] == 'assistant_tool_call' %}" | ||
"{{ '<|start_of_role|>assistant<|end_of_role|><|tool_call|>' + message['content'] + '<|end_of_text|>\n' }}" | ||
"{% elif message['role'] == 'tool_response' %}" | ||
"{{ '<|start_of_role|>tool_response<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}" | ||
"{% endif %}" | ||
"{% if loop.last and add_generation_prompt %}" | ||
"{{ '<|assistant|>' + '\n' }}" | ||
"{{ '<|start_of_role|>assistant<|end_of_role|>' }}" | ||
"{% endif %}" | ||
"{% endfor %}" | ||
) |
30 changes: 30 additions & 0 deletions
30
src/instructlab/training/chat_templates/ibm_legacy_tmpl.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
# First Party | ||
from instructlab.training.chat_templates.utils import SpecialTokens, TokenInfo | ||
|
||
SPECIAL_TOKENS = SpecialTokens( | ||
system=TokenInfo("<|system|>", add_to_tokenizer=True), | ||
user=TokenInfo("<|user|>", add_to_tokenizer=True), | ||
assistant=TokenInfo("<|assistant|>", add_to_tokenizer=True), | ||
eos=TokenInfo("<|endoftext|>", add_to_tokenizer=True), | ||
pad=TokenInfo("<|pad|>", add_to_tokenizer=True), | ||
bos=TokenInfo("<|begginingoftext|>", add_to_tokenizer=True), | ||
) | ||
|
||
CHAT_TEMPLATE = ( | ||
"{% for message in messages %}" | ||
"{% if message['role'] == 'pretraining' %}" | ||
"{{'<|pretrain|>' + message['content'] + '<|endoftext|>' + '<|/pretrain|>' }}" | ||
"{% elif message['role'] == 'system' %}" | ||
"{{'<|system|>'+ '\n' + message['content'] + '\n'}}" | ||
"{% elif message['role'] == 'user' %}" | ||
"{{'<|user|>' + '\n' + message['content'] + '\n'}}" | ||
"{% elif message['role'] == 'assistant' %}" | ||
"{{'<|assistant|>' + '\n' + message['content'] + '<|endoftext|>' + ('' if loop.last else '\n')}}" | ||
"{% endif %}" | ||
"{% if loop.last and add_generation_prompt %}" | ||
"{{ '<|assistant|>' + '\n' }}" | ||
"{% endif %}" | ||
"{% endfor %}" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Standard | ||
from dataclasses import dataclass, field | ||
from typing import List | ||
|
||
|
||
@dataclass | ||
class TokenInfo: | ||
token: str | ||
add_to_tokenizer: bool = False | ||
|
||
|
||
@dataclass | ||
class SpecialTokens: | ||
system: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
user: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
assistant: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
eos: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
pad: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
bos: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
start_role: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
end_role: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
tool: TokenInfo = field(default_factory=lambda: TokenInfo("")) | ||
|
||
def get_tokens_to_add(self) -> List[str]: | ||
return [ | ||
token_info.token | ||
for token_info in self.__dict__.values() | ||
if token_info.add_to_tokenizer and token_info.token | ||
] |
Oops, something went wrong.