Simple Neural Network: non linear system of equations? [closed]












1














I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:



$w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$



where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.



Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.



I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).










share|cite|improve this question















closed as too broad by the_candyman, Alexander Gruber Nov 30 at 3:13


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.




















    1














    I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
    For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:



    $w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$



    where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.



    Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.



    I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
    I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).










    share|cite|improve this question















    closed as too broad by the_candyman, Alexander Gruber Nov 30 at 3:13


    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















      1












      1








      1







      I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
      For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:



      $w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$



      where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.



      Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.



      I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
      I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).










      share|cite|improve this question















      I defined a very simple neural network of 2 inputs, 1 hidden layer with 2 nodes, and one output node.
      For each input pattern $vec{x} in mathbb{R} times mathbb{R} $ and associated output $o in mathbb{R} $, the resulting nonlinear equation is:



      $w_{o0} sigma(x_{0} W_{i00} + x_{1} W_{i10}) + w_{o1} sigma(x_{0} W_{i01} + x_{1} W_{i11}) = o$



      where $W_i$ is the weight matrix of order $2 times 2$ of input connections, $sigma(x) = frac{1}{1+exp{-x}}$, and $vec{w_o}$ is the weight vector of the two output connections before the output node.



      Given a dataset of $n$ (pattern, output) examples, there will be $n$ nonlinear equations.



      I'm asking how to find the solutions of those nonlinear systems, as an alternative method to solve the learning problem, without backpropagation.
      I've implemented an optimizer for the stated probem. If someone is interested I can provide the C sources (email: fportera2@gmail.com).







      nonlinear-system






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Nov 30 at 8:51

























      asked Nov 25 at 11:22









      Filippo Portera

      112




      112




      closed as too broad by the_candyman, Alexander Gruber Nov 30 at 3:13


      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






      closed as too broad by the_candyman, Alexander Gruber Nov 30 at 3:13


      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
























          2 Answers
          2






          active

          oldest

          votes


















          0














          For simplicity, let say that the output $y$ of your network is given by:



          $$y = f(x, w),$$



          where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.



          In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.




          Find $w$ such that $$f(x,w) = o,$$




          where $o$ is the target vector.



          Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.



          The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:




          Find $w$ such that $$|f(x,w) - o|,$$
          is minimized,




          where $| cdot |$ is a norm.



          This approach is better for two reasons:




          1. In general it is easier to be solved than just solving the system $f(x,w) = o$;

          2. This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.


          Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!






          share|cite|improve this answer























          • Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
            – Filippo Portera
            Nov 25 at 12:07












          • @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
            – the_candyman
            Nov 25 at 12:18










          • I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
            – Filippo Portera
            Nov 29 at 7:39












          • If you want, I can give you the c sources of my optimizers.
            – Filippo Portera
            Nov 29 at 11:37





















          0














          I think there are at least numerical solutions attempts:



          https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf



          and:



          https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam






          share|cite|improve this answer























          • The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
            – the_candyman
            Nov 29 at 11:39










          • At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
            – Filippo Portera
            Nov 29 at 11:46










          • Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
            – the_candyman
            Nov 29 at 16:59


















          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          For simplicity, let say that the output $y$ of your network is given by:



          $$y = f(x, w),$$



          where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.



          In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.




          Find $w$ such that $$f(x,w) = o,$$




          where $o$ is the target vector.



          Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.



          The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:




          Find $w$ such that $$|f(x,w) - o|,$$
          is minimized,




          where $| cdot |$ is a norm.



          This approach is better for two reasons:




          1. In general it is easier to be solved than just solving the system $f(x,w) = o$;

          2. This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.


          Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!






          share|cite|improve this answer























          • Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
            – Filippo Portera
            Nov 25 at 12:07












          • @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
            – the_candyman
            Nov 25 at 12:18










          • I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
            – Filippo Portera
            Nov 29 at 7:39












          • If you want, I can give you the c sources of my optimizers.
            – Filippo Portera
            Nov 29 at 11:37


















          0














          For simplicity, let say that the output $y$ of your network is given by:



          $$y = f(x, w),$$



          where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.



          In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.




          Find $w$ such that $$f(x,w) = o,$$




          where $o$ is the target vector.



          Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.



          The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:




          Find $w$ such that $$|f(x,w) - o|,$$
          is minimized,




          where $| cdot |$ is a norm.



          This approach is better for two reasons:




          1. In general it is easier to be solved than just solving the system $f(x,w) = o$;

          2. This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.


          Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!






          share|cite|improve this answer























          • Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
            – Filippo Portera
            Nov 25 at 12:07












          • @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
            – the_candyman
            Nov 25 at 12:18










          • I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
            – Filippo Portera
            Nov 29 at 7:39












          • If you want, I can give you the c sources of my optimizers.
            – Filippo Portera
            Nov 29 at 11:37
















          0












          0








          0






          For simplicity, let say that the output $y$ of your network is given by:



          $$y = f(x, w),$$



          where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.



          In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.




          Find $w$ such that $$f(x,w) = o,$$




          where $o$ is the target vector.



          Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.



          The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:




          Find $w$ such that $$|f(x,w) - o|,$$
          is minimized,




          where $| cdot |$ is a norm.



          This approach is better for two reasons:




          1. In general it is easier to be solved than just solving the system $f(x,w) = o$;

          2. This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.


          Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!






          share|cite|improve this answer














          For simplicity, let say that the output $y$ of your network is given by:



          $$y = f(x, w),$$



          where $f$ is a nonlinear function from the space of inputs $x$ and of weights $w$ to the space of the output $y$.



          In general, we can not pretend that the network exactly learn the target corresponding to each input, i.e.




          Find $w$ such that $$f(x,w) = o,$$




          where $o$ is the target vector.



          Indeed, pretending this means solving exactly the non linear system of equations you reported, which in general is a very difficult problem and often without a solution.



          The idea behind the training of the neural network is to minimize the distance between output and target, i.e.:




          Find $w$ such that $$|f(x,w) - o|,$$
          is minimized,




          where $| cdot |$ is a norm.



          This approach is better for two reasons:




          1. In general it is easier to be solved than just solving the system $f(x,w) = o$;

          2. This guarantees a certain level of "elasticity" to your network - you don't really want that the output are really equal to the target, since in this way you are also able to classify new inputs not used for the training.


          Besides these considerations, IMHO backpropagation is just an appealing name for defining an optimization procedure which takes advantage on the structure of the function $f$, inherited by the topology of the neural network. Specifically, to find the derivative of the objective function you use the chain rule. The structure of the derivative says that "error is backpropagated But you are just finding the derivative for solving an optimization problem!







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Nov 25 at 11:53

























          answered Nov 25 at 11:47









          the_candyman

          8,70822044




          8,70822044












          • Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
            – Filippo Portera
            Nov 25 at 12:07












          • @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
            – the_candyman
            Nov 25 at 12:18










          • I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
            – Filippo Portera
            Nov 29 at 7:39












          • If you want, I can give you the c sources of my optimizers.
            – Filippo Portera
            Nov 29 at 11:37




















          • Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
            – Filippo Portera
            Nov 25 at 12:07












          • @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
            – the_candyman
            Nov 25 at 12:18










          • I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
            – Filippo Portera
            Nov 29 at 7:39












          • If you want, I can give you the c sources of my optimizers.
            – Filippo Portera
            Nov 29 at 11:37


















          Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
          – Filippo Portera
          Nov 25 at 12:07






          Can I ask you if is there a formalization of "a very difficult problem and often without a solution" ?
          – Filippo Portera
          Nov 25 at 12:07














          @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
          – the_candyman
          Nov 25 at 12:18




          @FilippoPortera Linear systems are easy. Thanks to linear algebra, everything is known about linear systems and their solutions. Nonlinear systems are not easy since there is no general theory to account for them.
          – the_candyman
          Nov 25 at 12:18












          I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
          – Filippo Portera
          Nov 29 at 7:39






          I've followed the article: google.com/… and it turns out that, solving the associated nonlinear system as an optimization problem with gradient descent, is much slower than backpropagation and, in addition, it doesn't find a solution of the system.
          – Filippo Portera
          Nov 29 at 7:39














          If you want, I can give you the c sources of my optimizers.
          – Filippo Portera
          Nov 29 at 11:37






          If you want, I can give you the c sources of my optimizers.
          – Filippo Portera
          Nov 29 at 11:37













          0














          I think there are at least numerical solutions attempts:



          https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf



          and:



          https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam






          share|cite|improve this answer























          • The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
            – the_candyman
            Nov 29 at 11:39










          • At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
            – Filippo Portera
            Nov 29 at 11:46










          • Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
            – the_candyman
            Nov 29 at 16:59
















          0














          I think there are at least numerical solutions attempts:



          https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf



          and:



          https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam






          share|cite|improve this answer























          • The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
            – the_candyman
            Nov 29 at 11:39










          • At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
            – Filippo Portera
            Nov 29 at 11:46










          • Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
            – the_candyman
            Nov 29 at 16:59














          0












          0








          0






          I think there are at least numerical solutions attempts:



          https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf



          and:



          https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam






          share|cite|improve this answer














          I think there are at least numerical solutions attempts:



          https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf



          and:



          https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiC7OuujPneAhUFsKQKHWnRDukQFjAAegQIARAC&url=http%3A%2F%2Fscientificbulletin.upm.ro%2Fpapers%2F2011-1%2FFinta-The-Gradient-Method-for-Overdetermined-Nonlinear-Systems.pdf&usg=AOvVaw1GRmd0hToZc61RKXC8Plam







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Nov 29 at 7:56

























          answered Nov 25 at 19:51









          Filippo Portera

          112




          112












          • The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
            – the_candyman
            Nov 29 at 11:39










          • At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
            – Filippo Portera
            Nov 29 at 11:46










          • Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
            – the_candyman
            Nov 29 at 16:59


















          • The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
            – the_candyman
            Nov 29 at 11:39










          • At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
            – Filippo Portera
            Nov 29 at 11:46










          • Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
            – the_candyman
            Nov 29 at 16:59
















          The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
          – the_candyman
          Nov 29 at 11:39




          The risk is (overfitting)[en.wikipedia.org/wiki/Overfitting]. Optimization, i.e. find weights so that a certain norm is minimized, is the good choice rather than finding "exact solution".
          – the_candyman
          Nov 29 at 11:39












          At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
          – Filippo Portera
          Nov 29 at 11:46




          At present, the problem is that I'm not converging towards a solution of the nonlinear system. If you want I can provide the code I've implemented.
          – Filippo Portera
          Nov 29 at 11:46












          Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
          – the_candyman
          Nov 29 at 16:59




          Dear Filippo, if your problem is to solve a nonlinear system, then I warmly suggest you to create a new question on this specific topic.
          – the_candyman
          Nov 29 at 16:59



          Popular posts from this blog

          Wiesbaden

          Marschland

          Dieringhausen