dpsoft: first submission #572

dpsoft · 2024-01-24T02:04:29Z

Check List:

Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
All formatting changes by the build are committed
Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
Output matches that of calculate_average_baseline.sh
For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (237)

Execution time: 11s
Execution time of reference implementation: 4min 30s
System specs: 8 Core i7 @ 2.80GHz, 16GB

gunnarmorling · 2024-01-25T22:20:27Z

Please run mvn verify and append the changes made by the formatter to the PR. Also please make sure that ./test.sh dpsoftpasses.

dpsoft · 2024-01-25T22:30:27Z

@gunnarmorling done!

gunnarmorling · 2024-01-25T22:39:47Z

There's a test failure, please see CI for details.

dpsoft · 2024-01-25T22:48:22Z

@gunnarmorling sorry, done!

gunnarmorling · 2024-01-27T14:02:16Z

Getting differences for the 1B rows file now:

Validating calculate_average_dpsoft.sh -- measurements_1B.txt
54c54
< Bloemfontein;-31.8;15.6;61.6
---
> Bloemfontein;-33.2;15.6;62.7
282c282
< Ouarzazate;-26.6;18.9;65.0
---
> Ouarzazate;-28.5;18.9;69.8
306c306
< Prague;-41.9;8.4;56.6
---
> Prague;-41.9;8.4;60.5
357c357
< Tamale;-19.3;27.9;74.7
---
> Tamale;-20.9;27.9;76.8
377c377
< Tromsø;-48.5;2.9;49.7
---
> Tromsø;-48.5;2.9;54.3
410c410
< Zanzibar City;-23.0;26.0;79.8
---
> Zanzibar City;-25.2;26.0;79.8

FAILURE: ./test.sh dpsoft measurements_1B.txt failed

gunnarmorling · 2024-01-28T09:21:38Z

Looking good for the regular 1B file now, but it fails with the 10K key set (see create_measurements3.sh):

java.nio.BufferUnderflowException
	at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:721)
	at java.base/java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:753)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.hashAndRewind(CalculateAverage_dpsoft.java:157)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.run(CalculateAverage_dpsoft.java:137)
	at java.base/java.lang.Thread.run(Thread.java:1583)

gunnarmorling · 2024-01-28T15:24:12Z

Running out of heap space now with 10K keys:

Validating calculate_average_dpsoft.sh -- measurements_10K_1B.txt
1c1,10000
< Terminating due to java.lang.OutOfMemoryError: Java heap space

dpsoft · 2024-01-28T18:54:11Z

@gunnarmorling It seems like the issue lies in the size of the segments, when running the code on my machine it throws an exception related to the size exceeding Integer.MAX_VALUE, while on the test machine, it fails with an "underflow" error.

I've fixed(i think so) the issue and conducted tests using the following approach:

1) ./create_measurements3.sh 1000000000

2) ./calculate_average_baseline.sh > baseline-3.out //changing the path to the 10K key set in the source code.

3) ./calculate_average_dposft.sh > dpost-3.out //changing the path to the 10K key set in the source code.

and finally:

4)  diff baseline-3.out dpsoft-3.out -> nothing!

gunnarmorling · 2024-01-28T22:34:34Z

Similar error as before. Note that test runs on 32 machines, you should be able to reproduce the issue by using that many chunks rather than relying on CPU count on your machine.

Using java version 21.0.2-graal in this shell.
Validating calculate_average_dpsoft.sh -- measurements_10K_1B.txt
Exception in thread "Thread-20" Exception in thread "Thread-27" Exception in thread "Thread-30" java.nio.BufferUnderflowException
	at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:721)
	at java.base/java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:753)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.hashAndRewind(CalculateAverage_dpsoft.java:164)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.run(CalculateAverage_dpsoft.java:144)
	at java.base/java.lang.Thread.run(Thread.java:1583)
java.nio.BufferUnderflowException
	at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:721)
	at java.base/java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:753)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.hashAndRewind(CalculateAverage_dpsoft.java:164)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.run(CalculateAverage_dpsoft.java:144)
	at java.base/java.lang.Thread.run(Thread.java:1583)
java.nio.BufferUnderflowException
	at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:721)
	at java.base/java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:753)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.hashAndRewind(CalculateAverage_dpsoft.java:164)
	at dev.morling.onebrc.CalculateAverage_dpsoft$MeasurementExtractor.run(CalculateAverage_dpsoft.java:144)
	at java.base/java.lang.Thread.run(Thread.java:1583)

dpsoft · 2024-01-29T11:48:07Z

@gunnarmorling fixed!

gunnarmorling · 2024-01-29T20:06:33Z

Hum, hum. So all tests pass now, it also passes the 10K keyset, but it's way too fast, finishing in 100ms (current fastest one is 2.7 sec). I feel it may somehow skip portions if the file?

dpsoft · 2024-01-29T20:51:25Z

@gunnarmorling I'll take a look

dpsoft · 2024-01-29T21:43:53Z

@gunnarmorling Indeed was truncating the file because it was looking at the segment size instead of the file size.

gunnarmorling · 2024-01-31T19:22:22Z

Same thing as before, it seems to skip parts, 100ms for the 10K key set case:

Benchmark 1: timeout -v 300 ./calculate_average_dpsoft.sh 2>&1
  Time (mean ± σ):      98.1 ms ±   3.1 ms    [User: 144.0 ms, System: 31.2 ms]
  Range (min … max):    95.5 ms … 103.0 ms    5 runs

Summary
  dpsoft: trimmed mean 0.09729906015333334, raw times 0.09915400582,0.09552674882000001,0.09679519882,0.09594797582,0.10304930282000001

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:00.097 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_dpsoft.java)| 21.0.2-graal | [Diego Parra](https://github.com/dpsoft) |  |

dpsoft · 2024-01-31T20:00:09Z

@gunnarmorling the file is created with ./create_measurements3.sh 1000000000? and using eight cores for the evaluation?

gunnarmorling · 2024-02-01T11:06:25Z

Looking good now: 00:06.392. Passing for 10K keyset too.

dpsoft · 2024-02-01T13:05:31Z

@gunnarmorling Thank you so much for the initiative. I would have liked to have had more free cpu cycles to improve my solution. Perhaps next time! :)

gunnarmorling · 2024-02-01T13:43:38Z

Hehe, yeah, understood. Unfortunately, I had to make a cut-off date to keep the effort for running it in check. Next time :)

…

Message ID: ***@***.***>

dpsoft added 10 commits January 18, 2024 20:06

dpsoft: first submission

999a1ba

minor clean up

4068bdf

map with linear probing

4dccbc2

clean up

b6b35bd

Merge branch 'gunnarmorling:main' into main

f74703f

Merge branch 'gunnarmorling:main' into main

582c90c

update prepare

0ff6f54

clean up

a8d0ff6

remove string format

1970637

add credits

4e6a3d6

fix format

1c08be4

use prepare.sh

406b3df

graal 21.0.2

b650e29

gunnarmorling added the output differs label Jan 27, 2024

dpsoft added 2 commits January 27, 2024 18:25

fix differences

5f064e3

clean up

8f10e5f

gunnarmorling added Fails 10K and removed output differs labels Jan 28, 2024

underflow protection

a7f35f9

dpsoft added 2 commits January 28, 2024 15:40

improve segments generation logic

d15df6e

clean up

24ede30

remove unnecessary alignment in findsegment

600ea10

new try

faee937

fix number of segments

15d8420

gunnarmorling merged commit bec0cef into gunnarmorling:main Feb 1, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpsoft: first submission #572

dpsoft: first submission #572

dpsoft commented Jan 24, 2024 •

edited

Loading

gunnarmorling commented Jan 25, 2024

dpsoft commented Jan 25, 2024

gunnarmorling commented Jan 25, 2024

dpsoft commented Jan 25, 2024 •

edited

Loading

gunnarmorling commented Jan 27, 2024

gunnarmorling commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

dpsoft commented Jan 28, 2024 •

edited

Loading

gunnarmorling commented Jan 28, 2024

dpsoft commented Jan 29, 2024

gunnarmorling commented Jan 29, 2024

dpsoft commented Jan 29, 2024

dpsoft commented Jan 29, 2024

gunnarmorling commented Jan 31, 2024

dpsoft commented Jan 31, 2024 •

edited

Loading

gunnarmorling commented Feb 1, 2024

dpsoft commented Feb 1, 2024 •

edited

Loading

gunnarmorling commented Feb 1, 2024 via email

dpsoft: first submission #572

dpsoft: first submission #572

Conversation

dpsoft commented Jan 24, 2024 • edited Loading

Check List:

gunnarmorling commented Jan 25, 2024

dpsoft commented Jan 25, 2024

gunnarmorling commented Jan 25, 2024

dpsoft commented Jan 25, 2024 • edited Loading

gunnarmorling commented Jan 27, 2024

gunnarmorling commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

dpsoft commented Jan 28, 2024 • edited Loading

gunnarmorling commented Jan 28, 2024

dpsoft commented Jan 29, 2024

gunnarmorling commented Jan 29, 2024

dpsoft commented Jan 29, 2024

dpsoft commented Jan 29, 2024

gunnarmorling commented Jan 31, 2024

dpsoft commented Jan 31, 2024 • edited Loading

gunnarmorling commented Feb 1, 2024

dpsoft commented Feb 1, 2024 • edited Loading

gunnarmorling commented Feb 1, 2024 via email

dpsoft commented Jan 24, 2024 •

edited

Loading

dpsoft commented Jan 25, 2024 •

edited

Loading

dpsoft commented Jan 28, 2024 •

edited

Loading

dpsoft commented Jan 31, 2024 •

edited

Loading

dpsoft commented Feb 1, 2024 •

edited

Loading