GitHub - OpenBGBenchmark/OpenBG500: Dataset of business knowledge graph in OpenBG

OpenBG500

Information

OpenBG500 is an open chinese E-commerce and bussiness knowledge graph dataset contained 500 relations. This dataset is refined from the OpenBG, a million-scale multi-modal dataset evolving products and consumption demands in a unified schema. AliOpenKG500 is developed for several knowledge graph embedding evaluations.

The dataset splits all data into 3 parts. Base statistical information is shown in the table below.

#Relation	#Entity	#Train (opened)	#Valid (opened)	#Test
500	249,743	1242550	5000	5000

Data

OpenBG500 is available at Google Drive and Baidu Netdisk(password: 78fw). The main derectory of the dataset is as follows.

OpenBG500
├── OpenBG500_train.tsv 			# Training set
├── OpenBG500_dev.tsv 				# Validation set
├── OpenBG500_test.tsv 			    # Test set
├── OpenBG500_entity2text.tsv 		# Description of entities in Chinese
├── OpenBG500_relation2text.tsv 	# Description of relations in Chinese
└── OpenBG500_example_pred.tsv 	    # Submit example

Usage

Format

Triples

# OpenBG500_train.tsv/OpenBG500_dev.tsv
Head<\t>Relation<\t>Tail<\n>

Description of entities/relations in Chinese

# OpenBG500_entity2text.tsv/OpenBG500_relation2text.tsv
Entity(Relation)<\t>Description of entitie(relation)<\n>

Test and submit

# For OpenBG500_test.tsv, participants are required to predict 10 Tails for one instance. OpenBG500_example_pred.tsv is a submit example.
Head<\t>Relation<\n>

# OpenBG500_example_pred.tsv
Head<\t>Relation<\t>Tail 1<\t>Tail 2<\t>...<\t>Tail 10<\n>

Check the data

$ head -n 3 OpenBG500_train.tsv
ent_135492      rel_0352        ent_015651
ent_020765      rel_0448        ent_214183
ent_106905      rel_0418        ent_121073

Read the datasets

Read the original data:

with open('OpenBG500_train.tsv', 'r') as fp:
    data = fp.readlines()
    train = [line.strip('\n').split('\t') for line in data]
    _ = [print(line) for line in train[:2]]
    # ['ent_135492', 'rel_0352', 'ent_015651']
    # ['ent_020765', 'rel_0448', 'ent_214183']

Get the map of Entity(Relatioin)-Description: ent2text and rel2text:

with open('OpenBG500_entity2text.tsv', 'r') as fp:
    data = fp.readlines()
    lines = [line.strip('\n').split('\t') for line in data]
    _ = [print(line) for line in lines[:2]]
    # ['ent_101705', '短袖T恤']
    # ['ent_116070', '套装']

ent2text = {line[0]: line[1] for line in lines}

with open('OpenBG500_relation2text.tsv', 'r') as fp:
    data = fp.readlines()
    lines = [line.strip().split('\t') for line in data]
    _ = [print(line) for line in lines[:2]]
    # ['rel_0418', '细分市场']
    # ['rel_0290', '关联场景']

rel2text = {line[0]: line[1] for line in lines}

Transfer the data to description:

train = [[ent2text[line[0]],rel2text[line[1]],ent2text[line[2]]] for line in train]
_ = [print(line) for line in train[:2]]
# ['苦荞茶', '外部材质', '苦荞麦']
# ['精品三姐妹硬糕', '口味', '原味硬糕850克【10包40块糕】']

Submit in Alibaba TIANCHI

OpenBG Benchmark：Large Scale Open Business Knowledge Graph Benchmark is a benchmark open for a long time. Welcome to submit your result of OpenBG500.

Baseline result

We do some baseline method on this dataset. TransE, DistMult and ComplEx result are based on OpenKE toolkit, KG-BERT and GenKGC results are based our code.

Method	Hits@1	Hits@3	Hits@10
TransE	0.207	0.340	0.531
DistMult	0.049	0.088	0.216
ComplEx	0.053	0.120	0.266
KG-BERT	0.023	0.049	0.241
GenKGC	0.203	0.280	0.351

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenBG500

Information

Data

Usage

Format

Check the data

Read the datasets

Submit in Alibaba TIANCHI

Baseline result

About

Releases

Packages

Contributors 3

OpenBGBenchmark/OpenBG500

Folders and files

Latest commit

History

Repository files navigation

OpenBG500

Information

Data

Usage

Format

Check the data

Read the datasets

Submit in Alibaba TIANCHI

Baseline result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages