Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of ...

**11**

votes

**1**answer

1k views

### How to interpret caffe log with debug_info?

When facing difficulties during training (nans, loss does not converge, etc.) it is sometimes useful to look at more verbose training log by setting debug_info: true in the 'solver.prototxt' file....

**41**

votes

**4**answers

11k views

### Common causes of nans during training

I've noticed that a frequent occurrence during training is NANs being introduced.Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up....

**38**

votes

**3**answers

75k views

### gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it):temp=np.array(np.zeros_like(theta,float))for i in range(0,num_it):h=np.dot(X_norm,theta)#temp[j]=theta[j]-(alpha/m)*( np.sum( ...

**22**

votes

**2**answers

25k views

### What is `weight_decay` meta parameter in Caffe?

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameterweight_decay: 0.04What does this meta parameter mean? And what value should I assign to it?

**4**

votes

**1**answer

2k views

### Spark mllib predicting weird number or NaN

I am new to Apache Spark and trying to use the machine learning library to predict some data. My dataset right now is only about 350 points. Here are 7 of those points:"365","4",41401.387,5330569"...

**62**

votes

**3**answers

23k views

### Why should weights of Neural Networks be initialized to random numbers?

I am trying to build a neural network from scratch.Across all AI literature there is a consensus that weights should be initialized to random numbers in order for the network to converge faster.But ...

**32**

votes

**2**answers

25k views

### What is `lr_policy` in Caffe?

I just try to find out how I can use Caffe. To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand:# The learning rate policy...

**7**

votes

**1**answer

3k views

### Machine learning - Linear regression using batch gradient descent

I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples (m).When I try using the normal equation, I get the right answer but the wrong one ...

**11**

votes

**2**answers

3k views

### Cost function in logistic regression gives NaN as a result

I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am ...

**11**

votes

**2**answers

2k views

### Caffe: What can I do if only a small batch fits into memory?

I am trying to train a very large model. Therefore, I can only fit a very small batch size into GPU memory. Working with small batch sizes results with very noisy gradient estimations.What can I do ...

**4**

votes

**2**answers

3k views

### Write Custom Python-Based Gradient Function for an Operation? (without C++ Implementation)

I'm trying to write a custom gradient function for 'my_op' which for the sake of the example contains just a call to tf.identity() (ideally, it could be any graph).import tensorflow as tffrom ...

**24**

votes

**7**answers

20k views

### gradient descent seems to fail

I implemented a gradient descent algorithm for minimize a cost function in order to gain a hypothese for determining whether an image has a good quality. I did that in Octave. The idea is somehow ...

**15**

votes

**2**answers

5k views

### Neural network always predicts the same class

I'm trying to implement a neural network that classifies images into one of the two discrete categories. The problem is, however, that it currently always predicts 0 for any input and I'm not really ...

**8**

votes

**1**answer

8k views

### What's the triplet loss back propagation gradient formula?

I am trying to use caffe to implement triplet loss described in Schroff, Kalenichenko and Philbin "FaceNet: A Unified Embedding for Face Recognition and Clustering", 2015.I am new to this so how to ...

**8**

votes

**4**answers

7k views

### Fast gradient-descent implementation in a C++ library?

I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast ...