diff --git a/.github/workflows/contributor.yml b/.github/workflows/contributor.yml new file mode 100644 index 00000000..34f4820e --- /dev/null +++ b/.github/workflows/contributor.yml @@ -0,0 +1,18 @@ +on: + push: + branches: + - main + pull_request: + branches: + - main + +jobs: + contrib-readme-job: + runs-on: ubuntu-latest + steps: + - name: Add contributor list + uses: akhilmhdh/contributors-readme-action@master + with: + readme_path: "README.md" + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/README.md b/README.md index d148cd72..8087bd46 100644 --- a/README.md +++ b/README.md @@ -153,6 +153,12 @@ The main goal of logparser is used for research and benchmark purpose. Researche + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. + [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. + +### Contributors + + + + ### Discussion Welcome to join our WeChat group for any question and discussion. Alternatively, you can [open an issue here](https://github.com/logpai/logparser/issues/new). diff --git a/logparser/Brain/README.md b/logparser/Brain/README.md index 7c0c79a5..d49b584e 100644 --- a/logparser/Brain/README.md +++ b/logparser/Brain/README.md @@ -1,13 +1,8 @@ # Brain -### Abstract Automated log analysis can facilitate failure diagnosis for developers and operators using a large volume of logs. Log parsing is a prerequisite step for automated log analysis, which parses semi-structured logs into structured logs. However, existing parsers are difficult to apply to software-intensive systems, due to their unstable parsing accuracy on various software. Although neural network-based approaches are stable, their inefficiency makes it challenging to keep up with the speed of log production.We found that a logging statement always generate the same template words, thus, the word with the most frequency in each log is more likely to be constant. However, the identical constant and variable generated from different logging statements may break this rule Inspired by this key insight, we propose a new stable log parsing approach, called Brain, which creates initial groups according to the longest common pattern. Then a bidirectional tree is used to hierarchically complement the constant words to the longest common pattern to form the complete log template efficiently. Experimental results on 16 benchmark datasets show that our approach outperforms the state-of-the-art parsers on two widely-used parsing accuracy metrics, and it only takes around 46 seconds to process one million lines of logs. -Read more information about Brain from the following papers: - -+ Siyu Yu, Pinjia He, Ningjiang Chen, and Yifan Wu. [Brain: Log Parsing with Bidirectional Parallel Tree](https://ieeexplore.ieee.org/abstract/document/10109145), *IEEE Transactions on Service Computing*, 2023. - ### Running @@ -59,9 +54,9 @@ Running the benchmark script on Loghub_2k datasets, you could obtain the followi | Mac | 0.995821 | 0.942 | -### Citation +### πŸ”₯ Citation -:telescope: If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers. +If you use the code or benchmarking results in your publication, please kindly cite the following papers. ++ [**TSC'23**] Siyu Yu, Pinjia He, Ningjiang Chen, and Yifan Wu. [Brain: Log Parsing with Bidirectional Parallel Tree](https://ieeexplore.ieee.org/abstract/document/10109145), *IEEE Transactions on Service Computing*, 2023. + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. -+ [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. diff --git a/logparser/DivLog/README.md b/logparser/DivLog/README.md index 970f13a4..a5f6cc94 100755 --- a/logparser/DivLog/README.md +++ b/logparser/DivLog/README.md @@ -1,14 +1,7 @@ # DivLog -### Abstract - DivLog is an online LLM-based log parsing framework via in-context learning. It supports various LLMs as engines through API for high-quality parsing results. -Read more information about DivLog from the following papers: - -+ Junjielong Xu, Ruichun Yang, Yintong Huo, Chengyu Zhang, and Pinjia He. [DivLog: Log Parsing with Prompt Enhanced In-Context Learning](https://doi.org/10.1145/3597503.3639155). *In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE’24)* - - ### Running Install the required enviornment: @@ -80,9 +73,9 @@ Running the benchmark script on Loghub_2k datasets, you could obtain the followi | Hadoop | 0.9960 | 0.982609 | 0.991228 | 0.9940 | -### Citation +### πŸ”₯ Citation -:telescope: If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers. +If you use the code or benchmarking results in your publication, please kindly cite the following papers. ++ [**ICSE'24**] Junjielong Xu, Ruichun Yang, Yintong Huo, Chengyu Zhang, and Pinjia He. [DivLog: Log Parsing with Prompt Enhanced In-Context Learning](https://doi.org/10.1145/3597503.3639155). *IEEE/ACM 46th International Conference on Software Engineering (ICSE)*, 2024. + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. -+ [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. diff --git a/logparser/Drain/README.md b/logparser/Drain/README.md index be3f1b90..8c4e0b60 100644 --- a/logparser/Drain/README.md +++ b/logparser/Drain/README.md @@ -2,11 +2,6 @@ Drain is an online log parser that can parse logs into structured events in a streaming and timely manner. It employs a parse tree with fixed depth to guide the log group search process, which effectively avoids constructing a very deep and unbalanced tree. -Read more information about Drain from the following paper: - -+ Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. [Drain: An Online Log Parsing Approach with Fixed Depth Tree](http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf), *Proceedings of the 24th International Conference on Web Services (ICWS)*, 2017. - - ### Running The code has been tested in the following enviornment: @@ -51,15 +46,13 @@ Running the benchmark script on Loghub_2k datasets, you could obtain the followi | OpenStack | 0.992536 | 0.7325 | | Mac | 0.975451 | 0.7865 | +### Industrial Adoption -### Citation - -:telescope: If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers. - -+ [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. -+ [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. +Researchers from IBM ([@davidohana](https://github.com/davidohana)) made an upgrade version of Drain with additional features for production use: [https://github.com/logpai/Drain3](https://github.com/logpai/Drain3). +### πŸ”₯ Citation -### Industrial Adoption +If you use the code or benchmarking results in your publication, please kindly cite the following papers. -Researchers from IBM ([@davidohana](https://github.com/davidohana)) made an upgrade version of Drain with additional features for production use: [https://github.com/logpai/Drain3](https://github.com/logpai/Drain3). ++ [**ICWS'17**] Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. [Drain: An Online Log Parsing Approach with Fixed Depth Tree](http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf), *Proceedings of the 24th International Conference on Web Services (ICWS)*, 2017. ++ [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. diff --git a/logparser/NuLog/README.md b/logparser/NuLog/README.md index b64207b8..f4d0f30d 100644 --- a/logparser/NuLog/README.md +++ b/logparser/NuLog/README.md @@ -2,10 +2,6 @@ Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. NuLog presents a novel parsing technique that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. -Read more information about Brain from the following papers: - -+ Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao. [Self-Supervised Log Parsing](https://arxiv.org/abs/2003.07905), *Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)*, 2020. - ### Running @@ -46,9 +42,9 @@ Running the benchmark script on Loghub_2k datasets, you could obtain the followi | Mac | 0.748933 | 0.8165 | | Spark | 0.999996 | 0.998 | -### Citation +### πŸ”₯ Citation -:telescope: If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers. +If you use the code or benchmarking results in your publication, please kindly cite the following papers. ++ [**PKDD'20**] Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao. [Self-Supervised Log Parsing](https://arxiv.org/abs/2003.07905), *Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)*, 2020. + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. -+ [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. diff --git a/logparser/NuLog/requirements.txt b/logparser/NuLog/requirements.txt index 496f9d5e..72230082 100644 --- a/logparser/NuLog/requirements.txt +++ b/logparser/NuLog/requirements.txt @@ -1,4 +1,4 @@ -pillow==10.0.1 +pillow==6.1.0 pandas regex==2022.3.2 numpy diff --git a/logparser/Spell/README.md b/logparser/Spell/README.md index 8ab39886..e7367e72 100644 --- a/logparser/Spell/README.md +++ b/logparser/Spell/README.md @@ -57,4 +57,3 @@ Running the benchmark script on Loghub_2k datasets, you could obtain the followi + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. + [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. - diff --git a/logparser/ULP/README.md b/logparser/ULP/README.md index 260527ce..032963d5 100644 --- a/logparser/ULP/README.md +++ b/logparser/ULP/README.md @@ -2,10 +2,6 @@ ULP (Universal Log Parsing) is a highly accurate log parsing tool, the ability to extract templates from unstructured log data. ULP learns from sample log data to recognize future log events. It combines pattern matching and frequency analysis techniques. First, log events are organized into groups using a text processing method. Frequency analysis is then applied locally to instances of the same group to identify static and dynamic content of log events. When applied to 10 log datasets of the Loghub benchmark, ULP achieves an average accuracy of 89.2%, which outperforms the accuracy of four leading log parsing tools, namely Drain, Logram, Spell and AEL. Additionally, ULP can parse up to four million log events in less than 3 minutes. ULP can be readily used by practitioners and researchers to parse effectively and efficiently large log files so as to support log analysis tasks. -Read more information about Drain from the following paper: - -+ Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait-Mohamed, Mohammed A. Shehab. [An Effective Approach for Parsing Large Log Files](https://users.encs.concordia.ca/~abdelw/papers/ICSME2022_ULP.pdf), *Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME)*, 2022. - ### Running The code has been tested in the following enviornment: @@ -51,9 +47,9 @@ Running the benchmark script on Loghub_2k datasets, you could obtain the followi | Mac | 0.981294 | 0.814 | -### Citation +### πŸ”₯ Citation -:telescope: If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers. +If you use the code or benchmarking results in your publication, please kindly cite the following papers. ++ [**ICSME'22**] Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait-Mohamed, Mohammed A. Shehab. [An Effective Approach for Parsing Large Log Files](https://users.encs.concordia.ca/~abdelw/papers/ICSME2022_ULP.pdf), *Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME)*, 2022. + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. -+ [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016. diff --git a/logparser/logmatch/README.md b/logparser/logmatch/README.md index 43ff4f02..5d0ec1cb 100644 --- a/logparser/logmatch/README.md +++ b/logparser/logmatch/README.md @@ -17,9 +17,9 @@ Run the following scripts to start the demo: python demo.py ``` -### Citation +### πŸ”₯ Citation -:telescope: If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers. +If you use the code or benchmarking results in your publication, please kindly cite the following papers. + [**ICSE'19**] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Tools and Benchmarks for Automated Log Parsing](https://arxiv.org/pdf/1811.03509.pdf). *International Conference on Software Engineering (ICSE)*, 2019. + [**DSN'16**] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. [An Evaluation Study on Log Parsing and Its Use in Log Mining](https://jiemingzhu.github.io/pub/pjhe_dsn2016.pdf). *IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)*, 2016.