forked from dhentchel/dbt2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME-POSTGRESQL
209 lines (152 loc) · 6.23 KB
/
README-POSTGRESQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
Your user environment needs to be set like the 'examples/dbt2_profile'
in order for the scripts to work properly. For multi-node testing, it
needs to go into the PostgreSQL's user on the database system's
'~/.ssh/environment' file:
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
DBT2PGDATA=/var/lib/data
DBT2DBNAME=dbt2
-----
The C stored functions must be built and installed separately from the
rest of the kit.
cd storedproc/pgsql/c
make
make install
The 'make install' function will need to be run by the owner of the
database installation.
-----
The database parameter listen_addresses must be set to "*" (i.e. -c
listen_addresses=*) otherwise the database will not accept remote tpc/ip
connections.
-----
For PostgreSQL 7.4 and earlier the kit is configured to expect
PostgreSQL to have trusted connections defined. So in the pg_hba.conf
file, if you want to trust everyone, define this:
local all trust
host all 127.0.0.1 255.0.0.0 trust
host all 0.0.0.0 0.0.0.0 reject
-----
For PostgreSQL 7.4 edit the line in postgresql.conf that says
"tcpip_socket = false", uncomment, set to true, and restart the daemon.
-----
Contrary to the documentation, run 'make' instead of 'make standalone' and
run the kit in a multi-tier environment for PostgreSQL.
-----
make install-all-headers
Prior to PostgreSQL 8.0 this is required to be run from the postgresql
src if you're installing postgresql from source to make sure the server
include files get installed.
-----
For PostgreSQL 8.0 and later, run configure with the
--enable-thread-safety to avoid SIGPIPE handling for the multi-thread
dbt2 client program. This is a significant performance benefit.
-----
The kit now requires pg_autovacuum to be install from the PostgreSQL
source. It's in contrib/pg_autovacuum. Not sure which beta it was
released in, but it's in 8.0 now, and current releases of 7.4 if you
insist on using that release. As of 8.1, pg_autovacuum is in the
backend and is controlled through the GUC database settings.
-----
A really quick howto.
Edit bin/pgsql/pgsql_profile.in and follow the notes for the DBT2PGDATA
and DBDATA directory. DBT2PGDATA is where the database directory will
be created and DBDATA is where the database table data will be
generated.
Run configure after editing bin/pgsql/pgsql_profile.in.
Create a 1 warehouse database by running bin/pgsql/dbt2-pgsql-build-db
and put the data files in '/tmp/data':
dbt2-pgsql-build-db -g -w 1 -d /tmp/data
Flag description:
-d <dir> - Location for data files to load.
-g - Create the data files to load.
-w # - Set the warehouse scale factor.
Additional flags:
-h - Help.
-r - Drop the database before loading, for rebuilding the
database.
-t - Use tablespaces, read tablespace notes below.
Run a 5 minute (300 second) test by running bin/dbt2-run-workload:
dbt2-run-workload -a pgsql -d 300 -w 1 -o /tmp/result -c 10
Flag description:
-a <db> - Select database software to test.
-c # - Set the number of database connections.
-d # - Duration of steady state portion of test in seconds.
-o <dir> - Directory to put results.
-w # - Set the warehouse scale factor.
Additional flags:
-h - Help.
-p - Collect oprofile data.
-----
Tablespace Notes
The scripts assumes a specific tablespace layout.
The ${DBT2TSDIR} variable in pgsql_profile.in defines the directory where all
tablespace devices will be mounted. Directories or symlinks can be substituted
for what is assumed to be a mount point from this point forward.
In create_tables.sh is where the tablespaces are created.
The mount points that need to be created, and must be owned by the user running
the scripts are:
${DBT2TSDIR}/warehouse
${DBT2TSDIR}/district
${DBT2TSDIR}/customer
${DBT2TSDIR}/history
${DBT2TSDIR}/new_order
${DBT2TSDIR}/orders
${DBT2TSDIR}/order_line
${DBT2TSDIR}/item
${DBT2TSDIR}/stock
${DBT2TSDIR}/index1
${DBT2TSDIR}/index2
${DBT2TSDIR}/pk_customer
${DBT2TSDIR}/pk_district
${DBT2TSDIR}/pk_item
${DBT2TSDIR}/pk_new_order
${DBT2TSDIR}/pk_order_line
${DBT2TSDIR}/pk_orders
${DBT2TSDIR}/pk_stock
${DBT2TSDIR}/pk_warehouse
-----
A (slightly less) quick howto run the test (Thanks Min!):
small db 2 warehouse;
big db 20 warehouse,
tiny db 500 warehouse (scaling other factors)
generated from: (tpcc only allow scaling warehouse)
mkdir DB.small
./src/datagen --pgsql -w 2 -d DB.small -c 300 -i 10000 -o 300 -n 90
mkdir DB.big
./src/datagen --pgsql -w 20 -d DB.big -c 300 -i 10000 -o 300 -n 90
# scaling the other factor
mkdir DB.tiny
./src/datagen --pgsql -w 500 -d DB.tiny -c 3 -i 10 -o 3 -n 9
You can get # of warehouse from psql: select count(*) from warehouse;
Then load database data using load_db.sh, make sure DBDATA variable
is pointing to the correct directory you just generated above.
Then run "client" program manually,
you can monitor the query queue length by type "status" command.
cd <dbt2_home>/bin/pgsql
../../src/client -d localhost -c 2 -l 5432 -o ../output/0
Then run "driver" program manually,
you can control tpw (terminal per warehouse) and think time etc.
../../src/driver -d localhost -l 360 -wmin 1 -wmax 20 -w 20 \
-c 3 -i 10 -o 3 -n 9 \
-ktd 0 -ktn 0 -kto 0 -ktp 0 -kts 0 \
-ttd 0 -ttn 0 -tto 0 -ttp 0 -tts 0 -tpw 80 -outdir ../output/0
Finally, look in ../output/0 directory for the possible error output
tuning:
1. If you have fsync on then the daemon will be waiting for disk I/O
for writing log (WAL)
2. For 1 warehouse, increase currency hurt per cpu utilization.
1 daemon 1 terminal: 12% in 8 way system
2 daemon 2 terminal: 9.2%
3 daemon 3 terminal: 6.7%
4 daemon 4 terminal: 5.5%
Note the tpc-c model: company scales with # of warehouses;
each warehouse supports 10 districts;
each district serves 3000 customers;
each warehouse maintains stock level of 100,000 items;
require ~10% orders are fulfilled from other warehouses due to not have
all items in the company's catalog;
-----
Generate an HTML report:
dbt2-generate-report --dbms pgsql --dbname dbt2 --i /tmp/results \
--o /tmp/results/report
Then open /tmp/results/report/index.html. Use the '-h' flag for more
options.