Skip to content

Commit

Permalink
doc: add arXiv link (#14)
Browse files Browse the repository at this point in the history
  • Loading branch information
Co1lin authored Jan 15, 2025
1 parent 3e90eca commit 84e7cc7
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Is the LLM-generated code 🔧 functional ***and*** ⛑️ secure? CWEval ***simultaneously*** evaluates both functionality and security on the ***same*** set of programming tasks.

To appear in [LLM4Code 2025](https://llm4code.github.io). arXiv is [here](https://arxiv.org/abs/2501.08200).

## 🚀 Quick Start

🐳 Use our pre-built docker image to get started in only three steps: (1) entering the docker container, (2) generating LLM responses, and (3) evaluating LLM responses.
Expand Down Expand Up @@ -105,6 +107,20 @@ python cweval/evaluate.py report_pass_at_k --eval_path evals/eval_4omini_t8

Detailed evaluation results are stored in `<eval_path>/res_all.json` (e.g. `evals/eval_4omini_t8/res_all.json`).

## 📜 Citation

```bibtex
@misc{peng2025cwevaloutcomedrivenevaluationfunctionality,
title={CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation},
author={Jinjun Peng and Leyi Cui and Kele Huang and Junfeng Yang and Baishakhi Ray},
year={2025},
eprint={2501.08200},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2501.08200},
}
```

## 💻 Development

### Python (required)
Expand Down

0 comments on commit 84e7cc7

Please sign in to comment.