Activation Function: Absolute Function,One Function Behaves more Individualized

Inspired by natural world mode, a activation function is proposed. It is absolute function. According to probability principle, nature word is normal distribution. Stimulation which happens frequently is low value, it is shown around zero in figure 1. Stimulation which happens accidentally is high value, it is shown far away from zero in figure 1. So the high value is the big stimulation, which is individualization. Through test on mnist dataset and fully-connected neural network and convolutional neural network, some conclusions are put forward. The line of accuracy of absolute function is a little shaken that is different from the line of accuracy of relu and leaky relu. The absolute function can keep the negative parts as equal as the positive parts, so the individualization is more active than relu and leaky relu function. In order to generalization, the individualization is the reason of shake, the accuracy may be good in some set and may be worse in some set. The absolute function is less likely to be over-fitting. The batch size is small, the individualization is clear, vice versa. If you want to change the individualization of absolute function, just change the batch size. Through one more test on mnist and autoencoder, It is that the leaky relu function can do classification task well, while the absolute function can do generation task well. Because the classification task needs more universality and generation task need more individualization.


INTRODUCTION
In nature, the color has white and black, the temperature has hot and cool, the smell has fragrant and smelly, the taste has bitter and sweet.All information our body received are opposite mode.The pleasure irritation and painful irritation is not only the magnitude differences, but also the sign differences, so the negative parts should keep as a part.When stimulation arrives, two opposite stimulations can activate the activation function either, not only one stimulation can, so we can see and feel them.Activation function should activate the stimulation large enough no matter what sign it is.Inspired by nature world mode, a activation function is proposed.It is absolute function.There are many activation function, such as sigmoid, tanh, relu, leaky relu, elu.The function frequently used is relu [1].The relu function has dead relu problem that is some neuron never be activated, the related parameter never be updated.The relu function can transform dense matrix into sparse matrix and remove noise, however it loss the negative information which is important in nature.The relu function filter the negative part, while leaky relu function and elu function compress the negative part.The sigmoid function and tanh function also compress the negative part.The absolute function can keep the negative part as the positive part.The sigmoid function and tanh function have gradient disappearance problem.The absolute function doesn't have dead relu problem, and gradient disappearance and gradient explosion, however it has some interesting characteristics.
The image of absolute function is shown below.The formula is y=|x|.Because y"=0, so it is convex function and its' loss function can achieve minimum by gradient descent.According to probability principle, nature word is normal distribution [2].Stimulation which happens frequently is low value, it is shown around zero in figure 1. Stimulation which happens accidentally is high value, it is shown far away from zero in figure 1.So the high value is the big stimulation, which is individualization.In figure 4, batch size is 64, the line is no longer shaken like before, it behaves like the line of relu and leaky relu.The line of relu and leaky relu function are more universality and stable than the line of absolute function.It filters the negative parts, so it loses the individualization partly.In order to generalization, the individualization is the reason of shake, the accuracy may be good in some set and may be worse in some set.The absolute function is less likely to be over-fitting.

THE TEST OF ABSOLUTE FUNCTION ON CONVOLUTIONAL NEURAL NETWORK
Let's change the neural network, the result on convolutional neural network and mnist dataset is shown in figures 5 to 10.The convolutional neural network has 3 layers convolutional neural network and 2 layers fully connected network.The shake phenomenon is more clear than on fully-connected neural network when batch size is 32.When batch_size increase, the shake phenomenon is more weaker.It can be anti-over-fitting when batch_size is small.The loss which decrease firstly but increase then is odd when batch size is 32, however the loss is stable like relu and leaky relu when batch size is 128.Correspondingly, the accuracy is increasing step by step.
In order to increase the accuracy, the model has to learn the individualization which increases the loss.The relu and leaky relu function can not learn the individualization like the absolute function, so its loss is stable.The batch size is small, the individualization is clear, vice versa.If you want to change the individualization of absolute function, just change the batch size.
In order to visualize the intermediate activation [4], one more test is conducted.The convolutional neural network has 3 layers convolutional neural network and 2 layers fully connected network.The test result is shown in Figure3.

THE TEST OF ABSOLUTE FUNCTION ON AUTOENCODER GENERATION
Because the absolute function has more individualization, so I choose it to do autoencoder's generation [5] work.Abstract network(namely prediction network, always used as classification and regression) is common now.But the concrete network(namely generation network) which generates concrete information from concept or label is rare.Its principle is shown in figures 17 and 18.The test is on mnist dataset and convolutional neural network [6].The convolutional neural network(abstract network) is like LeNet, it has 3 layers convolutional neural network and 2 layers fully connected network.The concrete network is the inverse function of abstract network, it has 5 layers.Optimizer is adam, loss is mse.You can read my paper [5] for the detail.The test result is shown in figure 19.In figure 19, the left image is input,

Figure 1 :
Figure 1: The Image of Absolute Function

Figure 2 :
Figure 2: Training and Validation Accuracy,Batch size is 32

Figure 3 :
Figure 3: Training and Validation Loss,Batch Size is 32

Figure 4 :
Figure 4: Training and Validation Accuracy,Batch Size is 64

Figure 5 :Figure 6 :
Figure 5: Training and Validation Accuracy,Batch Size is 32

Figure 7 :Figure 8 :Figure 9 :
Figure 7: Training and Validation Accuracy, Batch Size is 64 7 and Figure 12.In Figure 11, the intermediate activation is sparse and blur with relu activation; Oppositely in Figure 12, the intermediate activation is dense and clear with absolute function.Because the relu activation function filter the negative part and the absolute activation function keep

Figure 15 :
Figure 15: Visualize Intermediate Activation with Absolute Activation (Second Convolution Layer)

Figure 18 :
Figure 18: The Architecture of Network.