Skip to content
NickLeippe edited this page Feb 2, 2014 · 10 revisions

spreadsheet of scrypt-jane coin's N-Factors

cudaminer is not yet suitable for low Nfactor coins, as the Kekkac part is done on the CPU with unoptimized code

for such coins cgminer for scrypt-jane will beat cudaminer.

This is due to the fact that with scrypt-jane you also need to calculate a KECCAK hash. This is done by the cpu singlethreaded atm. I hope in the future there will be an option to offload it to the GPU like the SHA256 hash for scrypt. A bit of research on Fermi performance with the new X-perimental kernel (Dave Andersen's work ported over to Fermi)

GTX 560Ti 1280MB: 0.83 kHash/s with -X 8x1 <--- the low VRAM is really hurting GT 630 4GB VRAM: 0.72 kHash/s <--- low cost, low performance. The RAM doesn't help Wink GTX 660 OEM 4GB VRAM: 1.25 kHash/s <--- that is one strange OEM part, I must say.

The new code is about 50% faster then the existing Fermi kernel for scrypt-jane. But I do get occasional validation errors on Fermi + Kepler when I enable -C 1 or -C 2. Strange. Hence avoid Fermi parts like the plague if you intend to do scrypt-jane.

In comparison.

A GT 640 (GK107) 4GB at stock clocks will do something in the range of 1.65 kHash/s. This is somewhat less than I expected because my GT750M laptop part (same chip) delivers 2.1 kHash/s.

A GT 640 (GK208) 1GB GDDR5 manages to do 1.25 kHash/s with mild overclock. Again the low VRAM is hurting.

I will be getting more Kepler parts for comparison. GTX 650, GTX 650Ti (not the Boost version) with 2GB each. Quote from: bathrobehero on January 15, 2014, 01:00:00 PM Quote from: cbuchner1 on January 14, 2014, 11:54:28 PM A GT 640 (GK107) 4GB at stock clocks will do something in the range of 1.65 kHash/s.

I wonder how this one performs with an N factor of 15.

The best case is that it achieves exactly half the hash rate as with N=14. Why? because for N=15 it's exactly twice the amount of work. This will affect CPUs likewise.

The usual case for most GPU models will be that the performance degrades by more than half, because the occupancy of the CUDA cores goes down (too many cores for fewer hashes to be computed simultaneously given the given available memory). Cards with 1GB and 2GB will be hit the hardest...

Cards with 4GB should barely see an impact for N = 14 --> 15

For cards with lots spare GPU cores (say, a GTX 780 or better) we can cut memory requirements in half and increase compute requirements instead (LOOKUP_GAP). It's on my TODO list.

Noob question but what is the utility of the -b option and what is mean ?

With my GTX 770 4Gb, it is necessary to use the -b option, and if yes, how can i know what is the best value

The best (in terms of fastest) is to use the same value as N. N is currently 32768 for Yacoin. Let me explain what is going on:

scrypt-jane is running a for loop like this (which will take a loong time to complete, in the order of quarter to half a second. The GPU is fully unresponsive during that time.

for (i=0; i < 32768; ++i) { do a lot of work and memory access } // run once

-b 1024 instead runs 32 shorter for loops like this, with small pauses inbetween when interactive mode is enabled. This is the same workload as regular scrypt hashing per loop. This is why I made this the default.

for (i=0; i < 1024; ++i) { do a lot of work and memory access } // run 32 times

-b 4096 runs 8 for loops like this, which is an OK intermediate between the two extremes. This might be a good compromise for display smoothness (if you're not planning on watching movies that is).

for (i=0; i < 4096; ++i) { do a lot of work and memory access } // run 8 times

In the future, interactive mode may auto-determine the batch size to hit a desired target frame rate exactly.

Christian

google doc spreadsheet hardware comparison

spreadsheet submission form

places to trade:

bter.com https://coinedup.com/OrderBook?market=VTC&base=BTC

a GT640 with 4GB does 1.80 khash/s using K4x12, interactive: 0, tex-cache: 2D, single-alloc: 1

Try the lookup-gap now on Compute 3.0 devices (Kepler kernel). The Titan kernel will follow soon... always autotune for different gap numbers, as configurations will differ wildly

NOTE: a gap value of 1 actually means no gap. ;-) a gap value of 2 specifies that only every 2nd value is stored in the scratchpad (and the intermediate values being recomputed on the fly), cutting memory use in half. Values of up to 4 may make sense IMHO. start with 2 and work your way up...

the more SMX your card has and the less memory there is, the more benefit you may see.. power consumption may also rise... Users of 1GB and 2GB cards may finally see some better hash rates now.

one of

-L 2 -L 3 -L 4

a few versions of CudaMiner (along with instructions) for Mac OS X users:mac builds

Two new experimental kernels added to github - currently for Linux only. The Visual C++ project has not yet been updated. You will want to run ./autogen.sh and configure after doing a git pull.

"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt. "Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.

I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels from the current github code.

Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail. Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it. Best config for "Y" is (guessing) No. of SMX x 32 - or just autotune.

The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).

When you make kHash/s benchmarks compare with the best scrypt values achieved with the 2013-12-18 release.

I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which might be slightly faster than what the 2013-12-18 release delivered.

Christian


VertCoin: scrypt:2048 <--- Salsa20/8, SHA-2 (SHA512) here N is specified directly, as there is no coin-specific logic yet to compute N as a function of time.

scrypt-jane:2048 does not make any sense. You're supposed to give an N-factor, not the N value.

MicroCoin: scrypt-jane:MRC <--- ChaCha, SHA-3 (Keccak) currently Nfactor=8, N=512 Yacoin: scrypt-jane:YAC <--ChaCha, SHA-3, currently Nfactor=14, N=32768

alternatively scrypt-jane:14 works for Yacoin, scrypt-jane:8 works for MicroCoin.... but only at the moment.


Yacoin on GT640: K kernel or T kernel (T for compute 3.5 only). Lookup gap 1 or 2. Don't go higher, the card doesn't have enough compute reserves. My GT 640 with 1GB RAM requires a lookup gap of 2, getting around 1.5-1.6 kHash/s. Your 2GB card might not need it. Best to try.

X is for Fermi.


Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux] February 01, 2014, 12:59:59 PM

#3446 Quote from: djm34 on February 01, 2014, 11:38:54 AM Anybody interested in UltraCoin ? A new script-jane coin which should launch today. Do I need to do any modification to mine it or cudaminer will detect itself the Nfactor (should be one though... )

if you know starttime, min and max nfactor, use

--algo=scrypt-jane:starttime,Nfmin,Nfmax

This works for all new scrypt-jane coins if the coin creators only modified the above constants in the wallet's sourcecode.

Note that cudaminer currently has problems with Nfactor changes during mining. So you will have to restart cudaminer when that happens.

NOTE: at extremely low Nfactors, cuda is a bit inefficient. I noticed this with MRC at N=256 (Nfactor 7). Too many memory transfers over PCI express. Keccak hashing on the GPU still not quite as fast as I would like it to be.

Clone this wiki locally