Simple Neural Network: non linear system of equations? [closed]
I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:
$w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$
where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.
Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.
I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).
nonlinear-system
closed as too broad by the_candyman, Alexander Gruber♦ Nov 30 at 3:13
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:
$w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$
where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.
Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.
I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).
nonlinear-system
closed as too broad by the_candyman, Alexander Gruber♦ Nov 30 at 3:13
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:
$w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$
where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.
Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.
I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).
nonlinear-system
I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:
$w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$
where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.
Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.
I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).
nonlinear-system
nonlinear-system
edited Nov 30 at 8:51
asked Nov 25 at 11:22
Filippo Portera
112
112
closed as too broad by the_candyman, Alexander Gruber♦ Nov 30 at 3:13
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as too broad by the_candyman, Alexander Gruber♦ Nov 30 at 3:13
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
For simplicity, let say that the output $y$ of your network is given by:
$$y = f(x, w),$$
where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.
In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.
Find $w$ such that $$f(x,w) = o,$$
where $o$ is the target vector.
Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.
The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:
Find $w$ such that $$|f(x,w) - o|,$$
is minimized,
where $| cdot |$ is a norm.
This approach is better for two reasons:
- In general it is easier to be solved than just solving the system $f(x,w) = o$;
- This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.
Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
add a comment |
I think there are at least numerical solutions attempts:
https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf
and:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
For simplicity, let say that the output $y$ of your network is given by:
$$y = f(x, w),$$
where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.
In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.
Find $w$ such that $$f(x,w) = o,$$
where $o$ is the target vector.
Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.
The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:
Find $w$ such that $$|f(x,w) - o|,$$
is minimized,
where $| cdot |$ is a norm.
This approach is better for two reasons:
- In general it is easier to be solved than just solving the system $f(x,w) = o$;
- This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.
Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
add a comment |
For simplicity, let say that the output $y$ of your network is given by:
$$y = f(x, w),$$
where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.
In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.
Find $w$ such that $$f(x,w) = o,$$
where $o$ is the target vector.
Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.
The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:
Find $w$ such that $$|f(x,w) - o|,$$
is minimized,
where $| cdot |$ is a norm.
This approach is better for two reasons:
- In general it is easier to be solved than just solving the system $f(x,w) = o$;
- This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.
Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
add a comment |
For simplicity, let say that the output $y$ of your network is given by:
$$y = f(x, w),$$
where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.
In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.
Find $w$ such that $$f(x,w) = o,$$
where $o$ is the target vector.
Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.
The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:
Find $w$ such that $$|f(x,w) - o|,$$
is minimized,
where $| cdot |$ is a norm.
This approach is better for two reasons:
- In general it is easier to be solved than just solving the system $f(x,w) = o$;
- This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.
Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!
For simplicity, let say that the output $y$ of your network is given by:
$$y = f(x, w),$$
where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.
In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.
Find $w$ such that $$f(x,w) = o,$$
where $o$ is the target vector.
Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.
The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:
Find $w$ such that $$|f(x,w) - o|,$$
is minimized,
where $| cdot |$ is a norm.
This approach is better for two reasons:
- In general it is easier to be solved than just solving the system $f(x,w) = o$;
- This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.
Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!
edited Nov 25 at 11:53
answered Nov 25 at 11:47
the_candyman
8,70822044
8,70822044
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
add a comment |
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
– Filippo Portera
Nov 25 at 12:07
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
@FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
– the_candyman
Nov 25 at 12:18
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
– Filippo Portera
Nov 29 at 7:39
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
If you want, I can give you the c sources of my optimizers.
– Filippo Portera
Nov 29 at 11:37
add a comment |
I think there are at least numerical solutions attempts:
https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf
and:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
add a comment |
I think there are at least numerical solutions attempts:
https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf
and:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
add a comment |
I think there are at least numerical solutions attempts:
https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf
and:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam
I think there are at least numerical solutions attempts:
https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf
and:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam
edited Nov 29 at 7:56
answered Nov 25 at 19:51
Filippo Portera
112
112
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
add a comment |
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
– the_candyman
Nov 29 at 11:39
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
– Filippo Portera
Nov 29 at 11:46
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
– the_candyman
Nov 29 at 16:59
add a comment |