Skip to content

Commit

Permalink
feat: introducing the experimental package and refactoring test str…
Browse files Browse the repository at this point in the history
…ucture (#433)

Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: NeMo-Aligner CI <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
terrykong and pre-commit-ci[bot] authored Jan 22, 2025
1 parent 5f4f6d6 commit 502ebde
Show file tree
Hide file tree
Showing 20 changed files with 64 additions and 6 deletions.
4 changes: 2 additions & 2 deletions tests/conftest.py → conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy
from nemo_aligner.models.nlp.gpt.megatron_gpt_ppo_actor import MegatronGPTActorModel
from nemo_aligner.testing.utils import Utils
from nemo_aligner.utils.train_script_utils import init_distributed, resolve_and_create_trainer
from tests.test_mcore_utilities import Utils

dir_path = os.path.dirname(os.path.abspath(__file__))
# TODO: This file exists because in cases where TRTLLM MPI communicators are involved,
Expand Down Expand Up @@ -67,7 +67,7 @@ def run_only_on_device_fixture(request, device):

@pytest.fixture
def init_model_parallel():
from tests.test_mcore_utilities import Utils
from nemo_aligner.testing.utils import Utils

def initialize(*args, **kwargs):
Utils.initialize_model_parallel(*args, **kwargs)
Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide-experimental/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Experimental Docs

This directory contains documentation for features that are still experimental or under development and not yet ready for general use.

More context can be found in the [experimental/README.md](../../nemo_aligner/experimental/README.md) file.
Empty file.
File renamed without changes.
File renamed without changes.
50 changes: 50 additions & 0 deletions nemo_aligner/experimental/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Experimental Package

The `experimental` sub-package contains projects that are under active development and may not be fully stable.

## Experimental Project Directory Structure:

```
NeMo-Aligner/
├── docs/
│ ├── user-guide/
│ │ └── ppo.html
│ └── user-guide-experimental/ <----- experimental docs
│ └── new-thing.html
├── nemo_aligner/
│ ├── algorithms/
│ ├── data/
│ │ ├── datasets.py
│ │ └── tests/
│ │ └── datasets_test.py
│ └── experimental/ <----- experimental sub-package
│ ├── <proj-name>/
│ ├── dataset.py <----- experimental dataset
│ ├── new_algo.py <----- experimental algo
│ ├── model.py <----- experimental model
│ └── tests/
│ └── model_test.py <----- experimental model test
└── tests/
└── functional/
└── dpo.sh
└── test_cases/
└── dpo-llama3
└── functional_experimental/ <----- experimental functional tests (mirrors functional/ structure)
├── new_algo.sh
└── test_cases/
└── new_algo-llama3
```

The directories below exist to organize experimental projects (source code), tests, and documentation.

- [nemo_aligner/experimental/](../../nemo_aligner/experimental/): Main experimental sub-package containing projects under development
- [tests/functional_experimental/](../../tests/functional_experimental/): Functional tests for experimental projects
- [docs/user-guide-experimental/](../../docs/user-guide-experimental/): Documentation directory for experimental features and algorithms

The `experimental` sub-package follows a modular structure where each project has its own directory (sub-package) containing implementation and tests.

## Guidelines for "experimental/" Projects

- **Scope**: Projects can include new model definitions, training loops, utilities, or unit tests.
- **Independence**: Projects should ideally be independent. Dependence on other projects signals it might benefit from being added to core with tests (and documentation if applicable).
- **Testing**: Must include at least one functional test [example](../../tests/functional/test_cases/dpo-llama3).
Empty file.
File renamed without changes.
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions tests/functional_experimental/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Experimental Functional Tests

More context can be found in the [experimental/README.md](../../nemo_aligner/experimental/README.md) file.
Empty file.
Empty file.
4 changes: 2 additions & 2 deletions tests/run_mpi_unit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ if [[ $NUM_GPUS_AVAILABLE -lt 2 ]]; then
fi

export PYTHONPATH=$(realpath ..):${PYTHONPATH:-}
CUDA_VISIBLE_DEVICES=0,1 mpirun -np 2 --allow-run-as-root pytest .. -rA -s -x -vv --mpi $@ || true
CUDA_VISIBLE_DEVICES=0,1 mpirun -np 2 --allow-run-as-root pytest ../nemo_aligner -rA -s -x -vv --mpi $@ || true

if [[ -f PYTEST_SUCCESS ]]; then
if [[ -f ../PYTEST_SUCCESS ]]; then
echo SUCCESS
else
echo FAILURE
Expand Down
4 changes: 2 additions & 2 deletions tests/run_unit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ if [[ $NUM_GPUS_AVAILABLE -lt 2 ]]; then
fi

export PYTHONPATH=$(realpath ..):${PYTHONPATH:-}
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 -m pytest .. -rA -s -x -vv $@ || true
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 -m pytest ../nemo_aligner -rA -s -x -vv $@ || true

if [[ -f PYTEST_SUCCESS ]]; then
if [[ -f ../PYTEST_SUCCESS ]]; then
echo SUCCESS
else
echo FAILURE
Expand Down

0 comments on commit 502ebde

Please sign in to comment.