Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

Paper link: https://arxiv.org/abs/2409.06957

Author's code: PF-PPO

The implementation is based on OpenRLHF

Credit: Wei Shen (@swtheing), Chuheng Zhang (zhangchuheng123)

Quick Start

PF-PPO

PF-PPO-Reweight Version

You need to set the following parameters in the combine_train_ana.sh first, like:

save_path=./ckpt/7b_llama_ppo_eb4_multi/
rollout_batch_size=2048
output_file=test_he.jsonl
test_file=HumanEval-10-instruction-llama.jsonl

Then, run the script:

sh combine_train_ana.sh

PF-PPO-Filter Version

You need to build openrlhf_filter version, then

save_path=./ckpt/7b_llama_ppo_eb4_multi/
rollout_batch_size=2048
output_file=test_he.jsonl
test_file=HumanEval-10-instruction-llama.jsonl

Then, run the script:

mv openrlhf_filter openrlhf
sh build_openrlhf.sh
sh combine_train_ana.sh

Performance

Family	Method	HumanEval	MBPP	LeetCode
Supervised Fine-Tuning	SFT	74.2	70.8	15.2
	RAFT (Dong et al., 2023)	76.9	71.3	17.8
	BOND (Sessa et al., 2024)	80.8	75.2	30.0
Direct Policy Optimization	DPO (Rafailov et al., 2024)	78.4	73.7	23.0
	IPO (Azar et al., 2024)	78.2	72.9	23.2
	KTO (Ezhayarajh et al., 2024)	77.9	72.5	22.4
	Iterative-DPO (Pang et al., 2024)	78.1	74.8	23.8
Reinforcement Learning	PPO-S (Hu et al., 2024)	78.1	73.8	25.2
	PPO-M (cf. Shao et al., 2024)	80.2	75.0	29.8
	PF-PPO (BoN)	75.8	71.7	16.8
	PF-PPO (BR)	82.9	75.9	33.0
	PF-PPO (BW)	82.4	76.2	30.4
SOTA (7B models)	Magicoder (Wei et al., 2023)	76.8	75.7

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
build/lib/openrlhf		build/lib/openrlhf
dockerfile		dockerfile
docs		docs
evaluation		evaluation
examples		examples
openrlhf.egg-info		openrlhf.egg-info
openrlhf		openrlhf
openrlhf_filter		openrlhf_filter
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

Quick Start

PF-PPO

PF-PPO-Reweight Version

PF-PPO-Filter Version

Performance

About

Releases

Packages

Languages

License

swtheing/PF-PPO-RLHF

Folders and files

Latest commit

History

Repository files navigation

Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

Quick Start

PF-PPO

PF-PPO-Reweight Version

PF-PPO-Filter Version

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages