Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle.dump gives SystemError #8

Open
xdvx opened this issue Feb 27, 2017 · 19 comments
Open

pickle.dump gives SystemError #8

xdvx opened this issue Feb 27, 2017 · 19 comments

Comments

@xdvx
Copy link

xdvx commented Feb 27, 2017

picke.dump gives me SystemError if I try larger batch of images. Thought it is memory issue, tried same batch on 32GB ram machine, same error.

@bakwc
Copy link
Owner

bakwc commented Feb 27, 2017

Thanks for report! Could you please provide more details? What are you trying to run - pcr.py or nnpcr.py? Are you trying to train a new model? What exact SystemError do you have (please attach full exception trace)? How large is your training set?

@xdvx
Copy link
Author

xdvx commented Feb 27, 2017

I got this running nnpcr.py

This is an error message:
pickle.dump(obj, open(fileName + '.tmp', 'wb'), -1) SystemError: error return without exception set

I had 10k positive / 10k negative images. While googling I've found only one solution to run same script on Python 3. Took me awhile but no error message anymore.

I have one more question, I see that this is set to constant number:
def train(self, numIterations=1500):

Should this represent the size of training set?

@bakwc
Copy link
Owner

bakwc commented Feb 27, 2017

Seems like this dataset is too large to fit into in-memory cache. You can comment line 114 (saveCache((trainX, trainY, testX, testY), 'nncache.bin')) - this cache is only used to improve speed if running train multiple times.

I have one more question, I see that this is set to constant number:
def train(self, numIterations=1500):

Should this represent the size of training set?

No (at least directly). But you can try to set different numbers here. If numIterations too little or too large - the accuracy will be poor. 1500 was optimal for my training set (3K images total).

@bakwc
Copy link
Owner

bakwc commented Feb 27, 2017

BTW, what accuracy do you have with your dataset? Could you share your model (or how you gather it)?

@xdvx
Copy link
Author

xdvx commented Feb 27, 2017

Didn't get over 80% yet. Looking for better dataset. Basically I aim to train to recognize photos which are not suitable for advertisement networks. So even minor nudity is not accepted here.

Should I train it only with people photos or should I provide with all kind of different samples as negative ones? Do photo dimensions matter or I can collect photos with smaller resolution? At final stage I hope I could use this script to go through 20 million photos on my website and mark which ones are not suited for showing advertisements.

I am collecting data samples by crawling reddit. So I could gather even huge datasets like hundrends of thousands images.

@bakwc
Copy link
Owner

bakwc commented Feb 27, 2017

Should I train it only with people photos or should I provide with all kind of different samples as negative ones?

Not only people - just arbitrary images. The more - the better.

Do photo dimensions matter or I can collect photos with smaller resolution?

Currently all images converted to 128x128. You could try smaller one, they should will be upscaled.

I am collecting data samples by crawling reddit. So I could gather even huge datasets like hundrends of thousands images.

Crawling reddit is a good idea, may be I'll try too later.

@xdvx
Copy link
Author

xdvx commented Feb 27, 2017

Would it be smart to double image size? Would it be enough to change this constant?
IMG_SIZE = 128
to
IMG_SIZE = 256

Must positive and negative data samples be at same size?

@bakwc
Copy link
Owner

bakwc commented Feb 27, 2017

Would it be smart to double image size? Would it be enough to change this constant?
IMG_SIZE = 128
to
IMG_SIZE = 256

It's not that simple - you also need to change architecture of neural network - eg. add additinal conv & max_pool layers.

Must positive and negative data samples be at same size?

Yep.

@xdvx
Copy link
Author

xdvx commented Feb 28, 2017

def train(self, numIterations=1500):

Is this enough for batch of 60K of training data?

@bakwc
Copy link
Owner

bakwc commented Feb 28, 2017

Not sure. Try to increase to 3K and check if quality is better or worse than with 1.5k.

@xdvx
Copy link
Author

xdvx commented Feb 28, 2017

It really takes lot of ram. I could only run 40k sample on 32gb ram. Will try 80k on 64gb tomorrow. My model size always end up being 13mb, is this right? I still can't get accuracy over 80%.

@bakwc
Copy link
Owner

bakwc commented Feb 28, 2017

It really takes lot of ram. I could only run 40k sample on 32gb ram.

Seems like it currently not optimized for large datests. Currently it is loading the whole dataset in-memory, need to fix working with dataset.

My model size always end up being 13mb, is this right? I still can't get accuracy over 80%.

Current network architecture is not very complex - may be you should try more complicated architectures, eg Inception. You can also try following:

  • tune iterations number
  • tune size of pre-outer layer (now 1024 - you can try to increase or decrease it)
  • tune size of channels for convolutional layers (6, 12, 24)
  • add additional layers
  • try 5x5 kernel instead 3x3 one (shape=[3, 3, ...] => shape=[5, 5, ...] )

@xdvx
Copy link
Author

xdvx commented Mar 1, 2017

By the way in new TensorFlow libarry you have to call

init_ops.zeros_initializer

like this, otherwise you will get error
init_ops.zeros_initializer()

I'll test with all those different parameters.

Also I get these warnings:

Use tf.losses.softmax_cross_entropy instead. [2017-03-01 13:01:03,853 deprecation.py:116 WARNING] From nnpcr.py:216: softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.softmax_cross_entropy instead. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:394: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.compute_weighted_loss instead. [2017-03-01 13:01:03,865 deprecation.py:116 WARNING] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:394: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.compute_weighted_loss instead. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.add_loss instead. [2017-03-01 13:01:03,879 deprecation.py:116 WARNING] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.

@xdvx
Copy link
Author

xdvx commented Mar 1, 2017

Which line is for pre-outer layer?

@bakwc
Copy link
Owner

bakwc commented Mar 1, 2017

W_fc1 = tf.get_variable("W_fc1", shape=[8 * 8 * 24, 1024], initializer=xavier())

@bakwc
Copy link
Owner

bakwc commented Mar 1, 2017

By the way in new TensorFlow libarry you have to call init_ops.zeros_initializer

Haven't yet ported to a new version. Will do it soon.

@xdvx
Copy link
Author

xdvx commented Mar 15, 2017

   x_image = tf.reshape(x, [-1, IMG_SIZE, IMG_SIZE, 3])     # 128

    W_conv1 = tf.get_variable("W_conv1", shape=[3, 3, 3, 64], initializer=xavier())
    b_conv1 = tf.get_variable('b_conv1', [1, 1, 1, 64])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
    h_pool1 = max_pool_2x2(h_conv1)                             # 64

    W_conv2 = tf.get_variable("W_conv2", shape=[3, 3, 64, 128], initializer=xavier())
    b_conv2 = tf.get_variable('b_conv2', [1, 1, 1, 128])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    h_pool2 = max_pool_2x2(h_conv2)                             # 32

    W_conv3 = tf.get_variable("W_conv3", shape=[3, 3, 128, 256], initializer=xavier())
    b_conv3 = tf.get_variable('b_conv3', [1, 1, 1, 256])
    h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
    h_pool3 = max_pool_2x2(h_conv3)                             # 16

    W_conv4 = tf.get_variable("W_conv4", shape=[3, 3, 256, 512], initializer=xavier())
    b_conv4 = tf.get_variable('b_conv4', [1, 1, 1, 512])
    h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
    h_pool4 = max_pool_2x2(h_conv4)                             # 8

    W_conv5 = tf.get_variable("W_conv5", shape=[3, 3, 512, 512], initializer=xavier())
    b_conv5 = tf.get_variable('b_conv5', [1, 1, 1, 512])
    h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
    h_pool5 = max_pool_2x2(h_conv5)                             # 4

    h_pool5_flat = tf.reshape(h_pool5, [-1, 4 * 4 * 512])

    W_fc1 = tf.get_variable("W_fc1", shape=[4 * 4 * 512, 4096], initializer=xavier())
    b_fc1 = tf.get_variable('b_fc1', [4096], initializer=init_ops.zeros_initializer())
    h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)

    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    W_fcO = tf.get_variable("W_fcO", shape=[4096, 2], initializer=xavier())
    b_fcO = tf.get_variable('b_fcO', [2], initializer=init_ops.zeros_initializer())

    logits = tf.matmul(h_fc1_drop, W_fcO) + b_fcO
    y_conv = tf.nn.softmax(logits)


    cross_entropy = loss_ops.softmax_cross_entropy(logits, y_)

    train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)

    self.results = predictions = tf.argmax(y_conv, 1)

    self.probabilities = y_conv

    correct_prediction = tf.equal(predictions, tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

I got best results with this network. But accuracy didn't go over 93%. I ran 80 epochs.

It is giving me pretty good results in real life but I got curious to see its predictions. So I've checked what softmax is returning for me. And result wasn't good. It usually 1 or very close to one or 0. Shouldn't be like this. What do you think was my mistake? Neural network is too big for 128x128 pixels or 64k data batch is too small for such a big network?

@bakwc
Copy link
Owner

bakwc commented Mar 15, 2017

Sory, I can't understand what's the problem. 93% is rother good accuracy. All dataset is spliited into train (80%) and test (20%) one, accuracy is calculated over a test set. So if you have 93% accuracy - the same accuracy should be in real life too.
Or maybe you want to improve quality even more?

@xdvx
Copy link
Author

xdvx commented Mar 15, 2017

No the problem I think is with what confident results softmax returns me. My guess is results are too dense. But I'm new to this field as you can see. Just learnt a lot in couple weeks. If I output what softmax returns me it outputs numbers around 1.00, 0.00 or 0.9999, 0.0001 something from those lines. I'm just curious why my neural network is so confident on results, should probability ever go to 100% and in most cases be between 99-100%?

One more question would be would it be practical to add second fully connected layer on top of fully connected layer add another dropout and only then retrieve 2 final classes.

I also played with different kernel sizes, didn't give me any effect, just slowed down my training.

AdamOptimizer gave me quite an improvement on results.

Just happy to share what I've learnt.

@bakwc bakwc mentioned this issue Sep 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants