Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of ...

11
votes
1answer
1k views

How to interpret caffe log with debug_info?

When facing difficulties during training (nans, loss does not converge, etc.) it is sometimes useful to look at more verbose training log by setting debug_info: true in the 'solver.prototxt' file....
41
votes
4answers
11k views

Common causes of nans during training

I've noticed that a frequent occurrence during training is NANs being introduced.Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up....
38
votes
3answers
75k views

gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it):temp=np.array(np.zeros_like(theta,float))for i in range(0,num_it):h=np.dot(X_norm,theta)#temp[j]=theta[j]-(alpha/m)*( np.sum( ...
22
votes
2answers
25k views

What is `weight_decay` meta parameter in Caffe?

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameterweight_decay: 0.04What does this meta parameter mean? And what value should I assign to it?
4
votes
1answer
2k views

Spark mllib predicting weird number or NaN

I am new to Apache Spark and trying to use the machine learning library to predict some data. My dataset right now is only about 350 points. Here are 7 of those points:"365","4",41401.387,5330569"...
62
votes
3answers
23k views

Why should weights of Neural Networks be initialized to random numbers?

I am trying to build a neural network from scratch.Across all AI literature there is a consensus that weights should be initialized to random numbers in order for the network to converge faster.But ...
32
votes
2answers
25k views

What is `lr_policy` in Caffe?

I just try to find out how I can use Caffe. To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand:# The learning rate policy...
7
votes
1answer
3k views

Machine learning - Linear regression using batch gradient descent

I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples (m).When I try using the normal equation, I get the right answer but the wrong one ...
11
votes
2answers
3k views

Cost function in logistic regression gives NaN as a result

I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am ...
11
votes
2answers
2k views

Caffe: What can I do if only a small batch fits into memory?

I am trying to train a very large model. Therefore, I can only fit a very small batch size into GPU memory. Working with small batch sizes results with very noisy gradient estimations.What can I do ...
4
votes
2answers
3k views

Write Custom Python-Based Gradient Function for an Operation? (without C++ Implementation)

I'm trying to write a custom gradient function for 'my_op' which for the sake of the example contains just a call to tf.identity() (ideally, it could be any graph).import tensorflow as tffrom ...
24
votes
7answers
20k views

gradient descent seems to fail

I implemented a gradient descent algorithm for minimize a cost function in order to gain a hypothese for determining whether an image has a good quality. I did that in Octave. The idea is somehow ...
15
votes
2answers
5k views

Neural network always predicts the same class

I'm trying to implement a neural network that classifies images into one of the two discrete categories. The problem is, however, that it currently always predicts 0 for any input and I'm not really ...
8
votes
1answer
8k views

What's the triplet loss back propagation gradient formula?

I am trying to use caffe to implement triplet loss described in Schroff, Kalenichenko and Philbin "FaceNet: A Unified Embedding for Face Recognition and Clustering", 2015.I am new to this so how to ...
8
votes
4answers
7k views

Fast gradient-descent implementation in a C++ library?

I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast ...

153050per page
angop.ao, elkhabar.com, noa.al, afghanpaper.com, bbc.com, time.com, cdc.gov, nih.gov, xnxx.com, github.com,