-
Notifications
You must be signed in to change notification settings - Fork 141
Benchmark comparison daru vs pandas & numpy
DataFrame size = (10 ** n , 2). Means 2 columns and 10 ** n rows , where n is 2,3,4,5, 6, 7, 8
NOTE :
-
Benchmark-Daru::DataFrame-Initialize-method Rows = 2, colmn = 10**n
-
Benchmark-Daru::DataFrame-Initialize-method Rows = 10**n, colmn = 2
Comparing with Pandas and NumPy
Method on DataFrame Vector (Vector access and apply method): MEAN
Number of rows | Real Time | Pandas avg time | daru/pandas | NumPy avg time | daru/numpy |
---|---|---|---|---|---|
10 ** 2 | 0.00005025299833505414 | 0.00002460720880008011 | 2.042206360881147 | 0.00002989581900001212 | 1.6809373355864163 |
10 ** 3 | 0.00003981099871452898 | 0.00002646757190013886 | 1.5041424602428344 | 0.00003186123070008762 | 1.249512270548278 |
10 ** 4 | 0.00002224299896624871 | 0.00004172052699868800 | 0.5331428092207031 | 0.00004564955699970596 | 0.4872555272856642 |
10 ** 5 | 0.00002023599881795235 | 0.00015707365499838488 | 0.1288312722980848 | 0.00015398673500021686 | 0.13141390924305169 |
10 ** 6 | 0.00002279799809912220 | 0.00156584847998601610 | 0.014559517341885977 | 0.00131619396001042338 | 0.017321153866214106 |
10 ** 7 | 0.00002318800034117885 | 0.01197717989998636767 | 0.001936015033155279 | 0.01254011160017398652 | 0.0018491063780371084 |
Method on DataFrame Vector (Vector access and apply method): median
Number of rows | Real Time | Pandas avg time | daru/pandas | NumPy avg time | daru/numpy |
---|---|---|---|---|---|
10 ** 2 | 0.00007862599886721000 | 0.00002452081701998395 | 3.206499962995991 | 0.00002926640069999848 | 2.6865619613830436 |
10 ** 3 | 0.00002766100078588352 | 0.00002761571300001378 | 1.0016399281767492 | 0.00003277218365001318 | 0.8440389899338426 |
10 ** 4 | 0.00003393599763512611 | 0.00013392778345998522 | 0.2533902731636372 | 0.00011260383452001407 | 0.30137515103088575 |
10 ** 5 | 0.00003449999712756835 | 0.00093059150948996826 | 0.03707319137961709 | 0.00101361991900001162 | 0.03403642379246491 |
10 ** 6 | 0.00004165599966654554 | 0.01047319806000086839 | 0.003977390614394825 | 0.00969792961999701303 | 0.004295349760081922 |
10 ** 7 | 0.00004083300154889002 | 0.11693990839001344728 | 0.00034917935297764535 | 0.12528731213002175515 | 0.000325914897962804 |
Method on DataFrame Vector (Vector access and apply method): sum
Number of rows | Real Time | Pandas avg time | daru/pandas | NumPy avg time | daru/numpy |
---|---|---|---|---|---|
10 ** 2 | 0.00002047899761237204 | 0.00005380277499971271 | 0.3806308803306408 | 0.00006421745570005442 | 0.3189007940150249 |
10 ** 3 | 0.00000598000042373314 | 0.00006364958799968007 | 0.09395191094973306 | 0.00006488183500005107 | 0.09216756005326349 |
10 ** 4 | 0.00000708199877408333 | 0.00009241857799861464 | 0.07662960118461777 | 0.00012525086099776672 | 0.056542515697433855 |
10 ** 5 | 0.00003047000063816085 | 0.00041669552999883309 | 0.07312293615975717 | 0.00063505067000005507 | 0.04798042436229255 |
10 ** 6 | 0.00000825300230644643 | 0.00496692907003307496 | 0.001661590530100216 | 0.00576831682999909365 | 0.0014307470532002185 |
10 ** 7 | 0.00000825900133349933 | 0.06522532849994604198 | 0.00012662261001117323 | 0.07603158420024555553 | 0.00010862592724291357 |
Method on DataFrame Vector (Vector access and apply method): product
Number of rows | Real Time | Pandas avg time | daru/pandas | NumPy avg time | daru/numpy |
---|---|---|---|---|---|
10 ** 2 | 0.00000603099761065096 | 0.00004321602870004426 | 0.1395546465528144 | 0.00005185432299986132 | 0.11630655385602255 |
10 ** 3 | 0.00003470199953881092 | 0.00004952814159987611 | 0.70065216294927 | 0.00005675383920024615 | 0.6114476135503518 |
10 ** 4 | 0.00000656200063531287 | 0.00007920588500201119 | 0.08284738735191517 | 0.00008350405899909674 | 0.07858301397521109 |
10 ** 5 | 0.00000658700082567520 | 0.00037602675800008003 | 0.017517372595260358 | 0.00039026842300154388 | 0.016878129096417165 |
10 ** 6 | 0.00000757399902795441 | 0.00343111220001446771 | 0.0022074472026657925 | 0.00372708013001101781 | 0.00203215352601824 |
10 ** 7 | 0.00000760300099500455 | 0.03398789710008713605 | 0.0002236973053265204 | 0.03601436539975111373 | 0.00021111023089295062 |
Method on DataFrame Vector (Vector access and apply method): create df real time
Number of rows | Real Time | Pandas avg time | daru/pandas | NumPy avg time | daru/numpy |
---|---|---|---|---|---|
10 ** 2 | 0.00030666100064991042 | 0.00021143017630020040 | 1.4504126422072126 | 1 | 0.0003066610006499104 |
10 ** 3 | 0.00265719199887826107 | 0.00020791667079975013 | 12.780081504082329 | 1 | 0.002657191998878261 |
10 ** 4 | 0.03196071800266508944 | 0.00021926641200116137 | 145.7620330946803 | 1 | 0.03196071800266509 |
10 ** 5 | 0.23605635099738719873 | 0.00021049363799829733 | 1121.4417368723355 | 1 | 0.2360563509973872 |
10 ** 6 | 2.90790127900254447013 | 0.00028836761001002739 | 10084.007974756347 | 1 | 2.9079012790025445 |
10 ** 7 | 43.19904435400167130865 | 0.00028369350002321880 | 152273.64867529942 | 1 | 43.19904435400167 |
Method on DataFrame Vector (Vector access and apply method): return Unique elements
Number of rows | Real Time | Pandas avg time | daru/pandas | NumPy avg time | daru/numpy |
---|---|---|---|---|---|
10 ** 2 | 0.00003605099846026860 | 0.00003281123619999562 | 1.0987394147701577 | 0.00001474316399981035 | 2.445268767323781 |
10 ** 3 | 0.00001542400059406646 | 0.00004384756529980222 | 0.35176412848938804 | 0.00004764209610002581 | 0.3237473129159425 |
10 ** 4 | 0.00001767800131347030 | 0.00014993138900172199 | 0.11790727366146969 | 0.00059559319099935235 | 0.029681335483047057 |
10 ** 5 | 0.00003860300057567656 | 0.00111055827999734908 | 0.03475999528432556 | 0.00686007325100217689 | 0.005627199471964401 |
10 ** 6 | 0.00002105800012941472 | 0.02723045833001378965 | 0.0007733252181879105 | 0.07511945111000387088 | 0.00028032686365849077 |
10 ** 7 | 0.00002082100036204793 | 0.35414526560016384993 | 5.8792259517469304e-05 | 0.90684124439976587784 | 2.2959917726094664e-05 |
Real times for vector size [10 ** 2, 10 ** 3 , 10 ** 4, 10 ** 5, 10 ** 6]
Method on DataFrame Vector (Vector access and apply method): MEAN
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00005347399928723462 |
10 ** 3 | 0.00000930299938772805 |
10 ** 4 | 0.00002248199962195940 |
10 ** 5 | 0.00002107299951603636 |
10 ** 6 | 0.00002180799856432714 |
10 ** 7 | 0.00002622999818413518 |
Method on DataFrame Vector (Vector access and apply method): mode
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00017478399968240410 |
10 ** 3 | 0.00008010799865587614 |
10 ** 4 | 0.00011613899914664216 |
10 ** 5 | 0.00010050600030808710 |
10 ** 6 | 0.00016179999875021167 |
10 ** 7 | 0.00012378300016280264 |
Method on DataFrame Vector (Vector access and apply method): median
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00005261199839878827 |
10 ** 3 | 0.00002682400008779950 |
10 ** 4 | 0.00003368200123077258 |
10 ** 5 | 0.00005748499825131148 |
10 ** 6 | 0.00003802499850280583 |
10 ** 7 | 0.00008999200144899078 |
Method on DataFrame Vector (Vector access and apply method): sum
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00000746200021239929 |
10 ** 3 | 0.00000597000325797126 |
10 ** 4 | 0.00000720399839337915 |
10 ** 5 | 0.00000745499710319564 |
10 ** 6 | 0.00005896600123378448 |
10 ** 7 | 0.00000900500162970275 |
Method on DataFrame Vector (Vector access and apply method): product
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00002019299790845253 |
10 ** 3 | 0.00000498999725095928 |
10 ** 4 | 0.00000655099938740022 |
10 ** 5 | 0.00000666200139676221 |
10 ** 6 | 0.00000736299989512190 |
10 ** 7 | 0.00000797199754742905 |
Method on DataFrame Vector (Vector access and apply method): median_absolute_deviation
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00005985799725749530 |
10 ** 3 | 0.00005071499981568195 |
10 ** 4 | 0.00007257600009324960 |
10 ** 5 | 0.00005898799645365216 |
10 ** 6 | 0.00011886399806826375 |
10 ** 7 | 0.00007509900024160743 |
Method on DataFrame Vector (Vector access and apply method): sum_of_squared_deviation
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00002594100078567863 |
10 ** 3 | 0.00000969399843597785 |
10 ** 4 | 0.00001122899993788451 |
10 ** 5 | 0.00001170699761132710 |
10 ** 6 | 0.00006296100036706775 |
10 ** 7 | 0.00001322000025538728 |
Method on DataFrame Vector (Vector access and apply method): average_deviation_populationa
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00004264399831299670 |
10 ** 3 | 0.00003976099833380431 |
10 ** 4 | 0.00002845899871317670 |
10 ** 5 | 0.00005149999924469739 |
10 ** 6 | 0.00003221800216124393 |
10 ** 7 | 0.00008974700176622719 |
Method on DataFrame Vector (Vector access and apply method): create df real time
Number of rows | Real Time |
---|---|
10 ** 2 | 0.00029346900191740133 |
10 ** 3 | 0.00270435700076632202 |
10 ** 4 | 0.03130596700066234916 |
10 ** 5 | 0.24355140899933758192 |
10 ** 6 | 2.85959107999951811507 |
10 ** 7 | 52.26341757000045618042 |
Means => ["0.000018035998", "0.000009484000", "0.000018049001", "0.000020885000", "0.000021848999", "0.000023175000"]
mode => ["0.000150011001", "0.000117476000", "0.000117858001", "0.000110462001", "0.000139565002", "0.000111646001"]
median => ["0.000051891000", "0.000053140000", "0.000036403999", "0.000055286000", "0.000061164001", "0.000044535000"]
sum => ["0.000007409000", "0.000008056000", "0.000007533999", "0.000007462000", "0.000007916000", "0.000008029001"]
product => ["0.000024531000", "0.000007930001", "0.000006863002", "0.000006942999", "0.000007171000", "0.000007293002"]
median_absolute_deviation => ["0.000100432999", "0.000113962000", "0.000076615001", "0.000062638999", "0.000064992000", "0.000065991000"]
sum_of_squared_deviation => ["0.000011425000", "0.000016450000", "0.000012232000", "0.000011624001", "0.000012728999", "0.000012634000"]
average_deviation_populationa => ["0.000030231000", "0.000066736000", "0.000031079000", "0.000053027999", "0.000055622000", "0.000032394999"]
create df real time => ["0.000306631999", "0.002730177999", "0.033998219000", "0.250292323000", "2.832325777999", "47.578107261001"]%
Number of rows | pandas_mean avg time | numpy_mean avg time |
---|---|---|
10 ** 2 | 0.00002522777009980928 | 0.00002991263389994856 |
10 ** 3 | 0.00002691261290019611 | 0.00003136286949993519 |
10 ** 4 | 0.00004254218300047796 | 0.00004586969699812471 |
10 ** 5 | 0.00015704863499922794 | 0.00015137835600035032 |
10 ** 6 | 0.00155946647999371645 | 0.00133805154000583566 |
10 ** 7 | 0.01195333830000890919 | 0.01233988390013109927 |
10 ** 8 | 0.12151296670017472379 | 0.12252252519974718425 |
Benchmarking function: pandas_mean
Testing with a dataframe of size: 100
Result (seconds): 0.00002590420289998292
Testing with a dataframe of size: 1000
Result (seconds): 0.00002713621290004085
Testing with a dataframe of size: 10000
Result (seconds): 0.00004180985899984080
Testing with a dataframe of size: 100000
Result (seconds): 0.00015845163699941623
Testing with a dataframe of size: 1000000
Result (seconds): 0.00154265971000313576
Testing with a dataframe of size: 10000000
Result (seconds): 0.01233993340001688852
Testing with a dataframe of size: 100000000
Result (seconds): 0.12452101310000215917
Benchmarking function: numpy_mean
Testing with a dataframe of size: 100
Result (seconds): 0.00003039180350006063
Testing with a dataframe of size: 1000
Result (seconds): 0.00003180017919994498
Testing with a dataframe of size: 10000
Result (seconds): 0.00004642649300058111
Testing with a dataframe of size: 100000
Result (seconds): 0.00015341703800004326
Testing with a dataframe of size: 1000000
Result (seconds): 0.00132965861999764464
Testing with a dataframe of size: 10000000
Result (seconds): 0.01232307760001276689
Testing with a dataframe of size: 100000000
Number of rows | pandas_median avg time | numpy_median avg time |
---|---|---|
10 ** 2 | 0.00002452081701998395 | 0.00002926640069999848 |
10 ** 3 | 0.00002761571300001378 | 0.00003277218365001318 |
10 ** 4 | 0.00013392778345998522 | 0.00011260383452001407 |
10 ** 5 | 0.00093059150948996826 | 0.00101361991900001162 |
10 ** 6 | 0.01047319806000086839 | 0.00969792961999701303 |
10 ** 7 | 0.11693990839001344728 | 0.12528731213002175515 |
10 ** 8 | 1.27000799899906269275 | 1.23991956400277558714 |
Benchmarking function: pandas_median
Testing with a dataframe of size: 100
Result (seconds): 0.00002582970006000323
Testing with a dataframe of size: 1000
Result (seconds): 0.00002824090917999456
Testing with a dataframe of size: 10000
Result (seconds): 0.00010940286248000120
Testing with a dataframe of size: 100000
Result (seconds): 0.00088729414723999977
Testing with a dataframe of size: 1000000
Result (seconds): 0.01075348846100041521
Testing with a dataframe of size: 10000000
Result (seconds): 0.12858627516000523117
Testing with a dataframe of size: 100000000
Result (seconds): 1.24747154800024873111
Benchmarking function: numpy_median
Testing with a dataframe of size: 100
Result (seconds): 0.00003089331586999833
Testing with a dataframe of size: 1000
Result (seconds): 0.00003515163099999882
Testing with a dataframe of size: 10000
Result (seconds): 0.00012442707055000028
Testing with a dataframe of size: 100000
Result (seconds): 0.00099820312949000251
Testing with a dataframe of size: 1000000
Result (seconds): 0.01108264675299960866
Testing with a dataframe of size: 10000000
Result (seconds): 0.13133820641999591206
Testing with a dataframe of size: 100000000
Result (seconds): 1.21390682299988839077
Number of rows | pandas_prod avg time | numpy_prod avg time |
---|---|---|
10 ** 2 | 0.00004321602870004426 | 0.00005185432299986132 |
10 ** 3 | 0.00004952814159987611 | 0.00005675383920024615 |
10 ** 4 | 0.00007920588500201119 | 0.00008350405899909674 |
10 ** 5 | 0.00037602675800008003 | 0.00039026842300154388 |
10 ** 6 | 0.00343111220001446771 | 0.00372708013001101781 |
10 ** 7 | 0.03398789710008713605 | 0.03601436539975111373 |
10 ** 8 | 0.35793153200211236253 | 0.35189221399923553690 |
Benchmarking function: pandas_prod
Testing with a dataframe of size: 100
Result (seconds): 0.00004491761369999949
Testing with a dataframe of size: 1000
Result (seconds): 0.00004912122980003915
Testing with a dataframe of size: 10000
Result (seconds): 0.00008044129299923952
Testing with a dataframe of size: 100000
Result (seconds): 0.00036101352399964525
Testing with a dataframe of size: 1000000
Result (seconds): 0.00347382971000115498
Testing with a dataframe of size: 10000000
Result (seconds): 0.03563231970001652649
Testing with a dataframe of size: 100000000
Result (seconds): 0.37846179100051813293
Benchmarking function: numpy_prod
Testing with a dataframe of size: 100
Result (seconds): 0.00005671319829998538
Testing with a dataframe of size: 1000
Result (seconds): 0.00005507186369995907
Testing with a dataframe of size: 10000
Result (seconds): 0.00008253360099934071
Testing with a dataframe of size: 100000
Result (seconds): 0.00040807778600083112
Testing with a dataframe of size: 1000000
Result (seconds): 0.00371396845999697692
Testing with a dataframe of size: 10000000
Result (seconds): 0.03488254110006892145
Testing with a dataframe of size: 100000000
Result (seconds): 0.35792863000006036600
Number of rows | pandas_sum avg time | numpy_sum avg time |
---|---|---|
10 ** 2 | 0.00005380277499971271 | 0.00006421745570005442 |
10 ** 3 | 0.00006364958799968007 | 0.00006488183500005107 |
10 ** 4 | 0.00009241857799861464 | 0.00012525086099776672 |
10 ** 5 | 0.00041669552999883309 | 0.00063505067000005507 |
10 ** 6 | 0.00496692907003307496 | 0.00576831682999909365 |
10 ** 7 | 0.06522532849994604198 | 0.07603158420024555553 |
10 ** 8 | 0.68193949160013289656 | 0.68379156730006795950 |
Benchmarking function: pandas_sum
Testing with a dataframe of size: 100
Result (seconds): 0.00005561769920004735
Testing with a dataframe of size: 1000
Result (seconds): 0.00005946561629998541
Testing with a dataframe of size: 10000
Result (seconds): 0.00009354037799948855
Testing with a dataframe of size: 100000
Result (seconds): 0.00040350849599963114
Testing with a dataframe of size: 1000000
Result (seconds): 0.00499632787000336975
Testing with a dataframe of size: 10000000
Result (seconds): 0.05969556259997262082
Testing with a dataframe of size: 100000000
Result (seconds): 0.71097542200004681945
Benchmarking function: numpy_sum
Testing with a dataframe of size: 100
Result (seconds): 0.00006173165039999731
Testing with a dataframe of size: 1000
Result (seconds): 0.00006634956489997421
Testing with a dataframe of size: 10000
Result (seconds): 0.00010499926699958451
Testing with a dataframe of size: 100000
Result (seconds): 0.00038439752700014651
Testing with a dataframe of size: 1000000
Result (seconds): 0.00517789042000004005
Testing with a dataframe of size: 10000000
Result (seconds): 0.07420776259996272883
Testing with a dataframe of size: 100000000
Result (seconds): 0.64270900600004099434
Number of rows | pandas_unique avg time | numpy_unique avg time |
---|---|---|
10 ** 2 | 0.00003281123619999562 | 0.00001474316399981035 |
10 ** 3 | 0.00004384756529980222 | 0.00004764209610002581 |
10 ** 4 | 0.00014993138900172199 | 0.00059559319099935235 |
10 ** 5 | 0.00111055827999734908 | 0.00686007325100217689 |
10 ** 6 | 0.02723045833001378965 | 0.07511945111000387088 |
10 ** 7 | 0.35414526560016384993 | 0.90684124439976587784 |
10 ** 8 | 6.69910524100123438984 | 10.35867670299921883270 |
Benchmarking function: pandas_unique
Testing with a dataframe of size: 100
Result (seconds): 0.00003485789000005752
Testing with a dataframe of size: 1000
Result (seconds): 0.00004646338229995308
Testing with a dataframe of size: 10000
Result (seconds): 0.00013272955199954595
Testing with a dataframe of size: 100000
Result (seconds): 0.00180719400100042547
Testing with a dataframe of size: 1000000
Result (seconds): 0.02744935295999311950
Testing with a dataframe of size: 10000000
Result (seconds): 0.36727430050004838957
Testing with a dataframe of size: 100000000
Result (seconds): 7.76852580100057821255
Benchmarking function: numpy_unique
Testing with a dataframe of size: 100
Result (seconds): 0.00001490040839998983
Testing with a dataframe of size: 1000
Result (seconds): 0.00004483109719994900
Testing with a dataframe of size: 10000
Result (seconds): 0.00057470103799914794
Testing with a dataframe of size: 100000
Result (seconds): 0.00743517332999999760
Testing with a dataframe of size: 1000000
Result (seconds): 0.07924894640999809170
Testing with a dataframe of size: 10000000
Result (seconds): 0.90400296050001993642
Testing with a dataframe of size: 100000000
Result (seconds): 10.34286806900036026491
Benchmarking function: numpy_sort
Testing with a dataframe of size: 100
Result (seconds): 0.00000912697550002122
Testing with a dataframe of size: 1000
Result (seconds): 0.00003445678490006685
Testing with a dataframe of size: 10000
Result (seconds): 0.00050352683799974329
Testing with a dataframe of size: 100000
Result (seconds): 0.00610998137699971227
Testing with a dataframe of size: 1000000
Result (seconds): 0.07806133034000595217
Testing with a dataframe of size: 10000000
Result (seconds): 0.89660990390002548445
Testing with a dataframe of size: 100000000
Result (seconds): 10.22523978899971552892
Benchmarking function: pandas_sort
Testing with a dataframe of size: 100
Result (seconds): 0.00022274660969997059
Testing with a dataframe of size: 1000
Result (seconds): 0.00028801353039998505
Testing with a dataframe of size: 10000
Result (seconds): 0.00093503032399985385
Testing with a dataframe of size: 100000
Result (seconds): 0.00960024423100003417
Testing with a dataframe of size: 1000000
Result (seconds): 0.14475218630000200037
Testing with a dataframe of size: 10000000
Result (seconds): 2.70540260359994144679
Testing with a dataframe of size: 100000000
Result (seconds): 43.03127763799966487568
Creating dataframe of size: 100
Result (seconds): 0.00021427773819996220
Creating dataframe of size: 1000
Result (seconds): 0.00020970309380008984
Creating dataframe of size: 10000
Result (seconds): 0.00021675117500126361
Creating dataframe of size: 100000
Result (seconds): 0.00021533839099902252
Creating dataframe of size: 1000000
Result (seconds): 0.00026724909999757072
Creating dataframe of size: 10000000
Result (seconds): 0.00028099869996367488
Creating dataframe of size: 100000000
Result (seconds): 0.00445340500118618365
-
shekhar-dev
branch is used for Python Numpy and Pandas Benchmark : https://github.com/Shekharrajak/Fast-Pandas/tree/shekhar_dev -
daru benchmarking : https://github.com/SciRuby/daru/pull/484. And gist benchmark.rb and benchmarker.rb