C classes then how many nodes does the last layer have?
C
what are the outputs at the last layer?
Probs that the sample belong to each class
what is the activation function at the last layer?
softmax
formula of softmax at last layer?
e^z where z is the value at the last layer
a at last layer = e^zi/sum(e^zi)
normalize so as to add up to 1
C = 2, softmax becomes?
logistic regression