Outer product reformulation of LU decomposition












0












$begingroup$


For my numerical analysis class, I wanted to implement the rather simple, in-place LU decomposition algorithm. I did this the naive way and found my code was very, very slow, because I was doing every operation in good old interpreted Python.



So I looked in some books (mainly my trusty Numerical Linear Algebra, by Trefethen and Bau), and found an easy way to speed up the computations, by using NumPy's np.dot routine. However, the algorithm is still very slow, and the book cited above mentions an outer product reformulation of the LU algorithm, which uses only one loop.



The exact statement is the following:




Like most of the algorithms in this book, Gaussian elimination involves a triply nested loop. In Algorithm 20.1, there are two explicit for loops, and the third loop is implicit in the vectors $u_{j, k colon m}$ and $u_{k,k colon m}$. Rewrite this algorithm with just one explicit for loop indexed by $k$. Inside this loop, $U$ will be updated at each step by a certain rank-one outer product. This "outer product" form of Gaussian elimination may be a better starting point than Algorithm 20.1 if one wants to optimize computer performance.




Now, I've been searching for this formulation for a while now, and can't seem to find it. The main things it needs to have are




  • it needs to be in-place;

  • it can't use SciPy routines (NumPy is fine however)


The current code I'm using is



def LUfactorize(A):
lA = len(A)
for k in range (0, lA-1):
for j in range(k+1, lA):
if abs(A[j, k]) > 1e-10:
A[j, k] /= A[k, k]
A[j, k+1:lA] -= A[j][k] * A[k, k+1:lA]









share|cite|improve this question









$endgroup$

















    0












    $begingroup$


    For my numerical analysis class, I wanted to implement the rather simple, in-place LU decomposition algorithm. I did this the naive way and found my code was very, very slow, because I was doing every operation in good old interpreted Python.



    So I looked in some books (mainly my trusty Numerical Linear Algebra, by Trefethen and Bau), and found an easy way to speed up the computations, by using NumPy's np.dot routine. However, the algorithm is still very slow, and the book cited above mentions an outer product reformulation of the LU algorithm, which uses only one loop.



    The exact statement is the following:




    Like most of the algorithms in this book, Gaussian elimination involves a triply nested loop. In Algorithm 20.1, there are two explicit for loops, and the third loop is implicit in the vectors $u_{j, k colon m}$ and $u_{k,k colon m}$. Rewrite this algorithm with just one explicit for loop indexed by $k$. Inside this loop, $U$ will be updated at each step by a certain rank-one outer product. This "outer product" form of Gaussian elimination may be a better starting point than Algorithm 20.1 if one wants to optimize computer performance.




    Now, I've been searching for this formulation for a while now, and can't seem to find it. The main things it needs to have are




    • it needs to be in-place;

    • it can't use SciPy routines (NumPy is fine however)


    The current code I'm using is



    def LUfactorize(A):
    lA = len(A)
    for k in range (0, lA-1):
    for j in range(k+1, lA):
    if abs(A[j, k]) > 1e-10:
    A[j, k] /= A[k, k]
    A[j, k+1:lA] -= A[j][k] * A[k, k+1:lA]









    share|cite|improve this question









    $endgroup$















      0












      0








      0





      $begingroup$


      For my numerical analysis class, I wanted to implement the rather simple, in-place LU decomposition algorithm. I did this the naive way and found my code was very, very slow, because I was doing every operation in good old interpreted Python.



      So I looked in some books (mainly my trusty Numerical Linear Algebra, by Trefethen and Bau), and found an easy way to speed up the computations, by using NumPy's np.dot routine. However, the algorithm is still very slow, and the book cited above mentions an outer product reformulation of the LU algorithm, which uses only one loop.



      The exact statement is the following:




      Like most of the algorithms in this book, Gaussian elimination involves a triply nested loop. In Algorithm 20.1, there are two explicit for loops, and the third loop is implicit in the vectors $u_{j, k colon m}$ and $u_{k,k colon m}$. Rewrite this algorithm with just one explicit for loop indexed by $k$. Inside this loop, $U$ will be updated at each step by a certain rank-one outer product. This "outer product" form of Gaussian elimination may be a better starting point than Algorithm 20.1 if one wants to optimize computer performance.




      Now, I've been searching for this formulation for a while now, and can't seem to find it. The main things it needs to have are




      • it needs to be in-place;

      • it can't use SciPy routines (NumPy is fine however)


      The current code I'm using is



      def LUfactorize(A):
      lA = len(A)
      for k in range (0, lA-1):
      for j in range(k+1, lA):
      if abs(A[j, k]) > 1e-10:
      A[j, k] /= A[k, k]
      A[j, k+1:lA] -= A[j][k] * A[k, k+1:lA]









      share|cite|improve this question









      $endgroup$




      For my numerical analysis class, I wanted to implement the rather simple, in-place LU decomposition algorithm. I did this the naive way and found my code was very, very slow, because I was doing every operation in good old interpreted Python.



      So I looked in some books (mainly my trusty Numerical Linear Algebra, by Trefethen and Bau), and found an easy way to speed up the computations, by using NumPy's np.dot routine. However, the algorithm is still very slow, and the book cited above mentions an outer product reformulation of the LU algorithm, which uses only one loop.



      The exact statement is the following:




      Like most of the algorithms in this book, Gaussian elimination involves a triply nested loop. In Algorithm 20.1, there are two explicit for loops, and the third loop is implicit in the vectors $u_{j, k colon m}$ and $u_{k,k colon m}$. Rewrite this algorithm with just one explicit for loop indexed by $k$. Inside this loop, $U$ will be updated at each step by a certain rank-one outer product. This "outer product" form of Gaussian elimination may be a better starting point than Algorithm 20.1 if one wants to optimize computer performance.




      Now, I've been searching for this formulation for a while now, and can't seem to find it. The main things it needs to have are




      • it needs to be in-place;

      • it can't use SciPy routines (NumPy is fine however)


      The current code I'm using is



      def LUfactorize(A):
      lA = len(A)
      for k in range (0, lA-1):
      for j in range(k+1, lA):
      if abs(A[j, k]) > 1e-10:
      A[j, k] /= A[k, k]
      A[j, k+1:lA] -= A[j][k] * A[k, k+1:lA]






      algorithms numerical-linear-algebra gaussian-elimination python outer-product






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Dec 5 '18 at 7:49









      PeiffapPeiffap

      367




      367






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          You can do



          def LUfactorizeB(A):
          lA = len(A)
          for k in range (0, lA-1):
          A[k+1:, k] = A[k+1:, k]/A[k, k]
          A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])


          which vectorizes your lines of code and thus should be faster.



          Edit: Here is my time measuring:



          import numpy as np
          from scipy import random, linalg
          import timeit

          def LU1(A):
          lA = len(A)
          for k in range(0,lA-1):
          for j in range(k+1, lA):
          if abs(A[j,k])>1e-10:
          A[j,k] /= A[k,k]
          A[j, k+1:] -= A[j][k] * A[k,k+1:lA]

          def LU2(A):
          for k in range(0,len(A)-1):
          A[k+1:,k] = A[k+1:,k]/A[k,k]
          A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])

          def tester(matrixSize):
          A = random.rand(matrixSize, matrixSize)
          A = A.T@A
          B = A.copy()
          C = A.copy()
          def test1():
          LU1(A)
          def test2():
          LU2(B)
          # warmup
          LU1(C)
          LU2(C)
          # measure time
          time1 = timeit.timeit(test1, number=10)/10
          time2 = timeit.timeit(test2, number=10)/10
          print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))

          testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]

          for k in testDimensions:
          tester(k)


          And my measurements:



          dim=  10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
          dim= 20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
          dim= 30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
          dim= 40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
          dim= 50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
          dim= 60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
          dim= 70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
          dim= 80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
          dim= 90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
          dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
          dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
          dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
          dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
          dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
          dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
          dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
          dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
          dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
          dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50


          As you probably notice, the upspeed is better than I thought. :D






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
            $endgroup$
            – Peiffap
            Dec 5 '18 at 12:28










          • $begingroup$
            Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:19












          • $begingroup$
            I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:20










          • $begingroup$
            The execution time for these matrices is about 100 times slower than for my old code
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:21










          • $begingroup$
            Which datatype do you use for A?
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:25











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3026780%2fouter-product-reformulation-of-lu-decomposition%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          You can do



          def LUfactorizeB(A):
          lA = len(A)
          for k in range (0, lA-1):
          A[k+1:, k] = A[k+1:, k]/A[k, k]
          A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])


          which vectorizes your lines of code and thus should be faster.



          Edit: Here is my time measuring:



          import numpy as np
          from scipy import random, linalg
          import timeit

          def LU1(A):
          lA = len(A)
          for k in range(0,lA-1):
          for j in range(k+1, lA):
          if abs(A[j,k])>1e-10:
          A[j,k] /= A[k,k]
          A[j, k+1:] -= A[j][k] * A[k,k+1:lA]

          def LU2(A):
          for k in range(0,len(A)-1):
          A[k+1:,k] = A[k+1:,k]/A[k,k]
          A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])

          def tester(matrixSize):
          A = random.rand(matrixSize, matrixSize)
          A = A.T@A
          B = A.copy()
          C = A.copy()
          def test1():
          LU1(A)
          def test2():
          LU2(B)
          # warmup
          LU1(C)
          LU2(C)
          # measure time
          time1 = timeit.timeit(test1, number=10)/10
          time2 = timeit.timeit(test2, number=10)/10
          print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))

          testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]

          for k in testDimensions:
          tester(k)


          And my measurements:



          dim=  10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
          dim= 20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
          dim= 30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
          dim= 40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
          dim= 50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
          dim= 60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
          dim= 70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
          dim= 80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
          dim= 90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
          dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
          dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
          dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
          dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
          dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
          dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
          dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
          dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
          dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
          dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50


          As you probably notice, the upspeed is better than I thought. :D






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
            $endgroup$
            – Peiffap
            Dec 5 '18 at 12:28










          • $begingroup$
            Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:19












          • $begingroup$
            I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:20










          • $begingroup$
            The execution time for these matrices is about 100 times slower than for my old code
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:21










          • $begingroup$
            Which datatype do you use for A?
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:25
















          2












          $begingroup$

          You can do



          def LUfactorizeB(A):
          lA = len(A)
          for k in range (0, lA-1):
          A[k+1:, k] = A[k+1:, k]/A[k, k]
          A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])


          which vectorizes your lines of code and thus should be faster.



          Edit: Here is my time measuring:



          import numpy as np
          from scipy import random, linalg
          import timeit

          def LU1(A):
          lA = len(A)
          for k in range(0,lA-1):
          for j in range(k+1, lA):
          if abs(A[j,k])>1e-10:
          A[j,k] /= A[k,k]
          A[j, k+1:] -= A[j][k] * A[k,k+1:lA]

          def LU2(A):
          for k in range(0,len(A)-1):
          A[k+1:,k] = A[k+1:,k]/A[k,k]
          A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])

          def tester(matrixSize):
          A = random.rand(matrixSize, matrixSize)
          A = A.T@A
          B = A.copy()
          C = A.copy()
          def test1():
          LU1(A)
          def test2():
          LU2(B)
          # warmup
          LU1(C)
          LU2(C)
          # measure time
          time1 = timeit.timeit(test1, number=10)/10
          time2 = timeit.timeit(test2, number=10)/10
          print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))

          testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]

          for k in testDimensions:
          tester(k)


          And my measurements:



          dim=  10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
          dim= 20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
          dim= 30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
          dim= 40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
          dim= 50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
          dim= 60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
          dim= 70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
          dim= 80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
          dim= 90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
          dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
          dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
          dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
          dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
          dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
          dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
          dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
          dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
          dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
          dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50


          As you probably notice, the upspeed is better than I thought. :D






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
            $endgroup$
            – Peiffap
            Dec 5 '18 at 12:28










          • $begingroup$
            Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:19












          • $begingroup$
            I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:20










          • $begingroup$
            The execution time for these matrices is about 100 times slower than for my old code
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:21










          • $begingroup$
            Which datatype do you use for A?
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:25














          2












          2








          2





          $begingroup$

          You can do



          def LUfactorizeB(A):
          lA = len(A)
          for k in range (0, lA-1):
          A[k+1:, k] = A[k+1:, k]/A[k, k]
          A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])


          which vectorizes your lines of code and thus should be faster.



          Edit: Here is my time measuring:



          import numpy as np
          from scipy import random, linalg
          import timeit

          def LU1(A):
          lA = len(A)
          for k in range(0,lA-1):
          for j in range(k+1, lA):
          if abs(A[j,k])>1e-10:
          A[j,k] /= A[k,k]
          A[j, k+1:] -= A[j][k] * A[k,k+1:lA]

          def LU2(A):
          for k in range(0,len(A)-1):
          A[k+1:,k] = A[k+1:,k]/A[k,k]
          A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])

          def tester(matrixSize):
          A = random.rand(matrixSize, matrixSize)
          A = A.T@A
          B = A.copy()
          C = A.copy()
          def test1():
          LU1(A)
          def test2():
          LU2(B)
          # warmup
          LU1(C)
          LU2(C)
          # measure time
          time1 = timeit.timeit(test1, number=10)/10
          time2 = timeit.timeit(test2, number=10)/10
          print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))

          testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]

          for k in testDimensions:
          tester(k)


          And my measurements:



          dim=  10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
          dim= 20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
          dim= 30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
          dim= 40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
          dim= 50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
          dim= 60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
          dim= 70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
          dim= 80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
          dim= 90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
          dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
          dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
          dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
          dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
          dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
          dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
          dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
          dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
          dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
          dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50


          As you probably notice, the upspeed is better than I thought. :D






          share|cite|improve this answer











          $endgroup$



          You can do



          def LUfactorizeB(A):
          lA = len(A)
          for k in range (0, lA-1):
          A[k+1:, k] = A[k+1:, k]/A[k, k]
          A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])


          which vectorizes your lines of code and thus should be faster.



          Edit: Here is my time measuring:



          import numpy as np
          from scipy import random, linalg
          import timeit

          def LU1(A):
          lA = len(A)
          for k in range(0,lA-1):
          for j in range(k+1, lA):
          if abs(A[j,k])>1e-10:
          A[j,k] /= A[k,k]
          A[j, k+1:] -= A[j][k] * A[k,k+1:lA]

          def LU2(A):
          for k in range(0,len(A)-1):
          A[k+1:,k] = A[k+1:,k]/A[k,k]
          A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])

          def tester(matrixSize):
          A = random.rand(matrixSize, matrixSize)
          A = A.T@A
          B = A.copy()
          C = A.copy()
          def test1():
          LU1(A)
          def test2():
          LU2(B)
          # warmup
          LU1(C)
          LU2(C)
          # measure time
          time1 = timeit.timeit(test1, number=10)/10
          time2 = timeit.timeit(test2, number=10)/10
          print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))

          testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]

          for k in testDimensions:
          tester(k)


          And my measurements:



          dim=  10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
          dim= 20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
          dim= 30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
          dim= 40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
          dim= 50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
          dim= 60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
          dim= 70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
          dim= 80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
          dim= 90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
          dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
          dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
          dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
          dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
          dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
          dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
          dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
          dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
          dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
          dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50


          As you probably notice, the upspeed is better than I thought. :D







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Dec 5 '18 at 13:40

























          answered Dec 5 '18 at 10:56









          sehiglesehigle

          1565




          1565












          • $begingroup$
            I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
            $endgroup$
            – Peiffap
            Dec 5 '18 at 12:28










          • $begingroup$
            Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:19












          • $begingroup$
            I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:20










          • $begingroup$
            The execution time for these matrices is about 100 times slower than for my old code
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:21










          • $begingroup$
            Which datatype do you use for A?
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:25


















          • $begingroup$
            I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
            $endgroup$
            – Peiffap
            Dec 5 '18 at 12:28










          • $begingroup$
            Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:19












          • $begingroup$
            I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:20










          • $begingroup$
            The execution time for these matrices is about 100 times slower than for my old code
            $endgroup$
            – Peiffap
            Dec 5 '18 at 13:21










          • $begingroup$
            Which datatype do you use for A?
            $endgroup$
            – sehigle
            Dec 5 '18 at 13:25
















          $begingroup$
          I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
          $endgroup$
          – Peiffap
          Dec 5 '18 at 12:28




          $begingroup$
          I don't understand why, but this code executes significantly slower on my machine than the code I was currently using...
          $endgroup$
          – Peiffap
          Dec 5 '18 at 12:28












          $begingroup$
          Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
          $endgroup$
          – sehigle
          Dec 5 '18 at 13:19






          $begingroup$
          Which python version do you use and which matrix dimensions do you have? For me this function performs faster in most cases. I have the most upspeed around matrix dimension 500 with factor 4-5 and below 100 there is not much of a difference. For bigger matrices (>1000) the upspeed is a bit more than factor 2 (this depends most likely on the cache sizes)
          $endgroup$
          – sehigle
          Dec 5 '18 at 13:19














          $begingroup$
          I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
          $endgroup$
          – Peiffap
          Dec 5 '18 at 13:20




          $begingroup$
          I'm using Python 3.6.1 for matrices anywhere between size 100x100 and 1600x1600.
          $endgroup$
          – Peiffap
          Dec 5 '18 at 13:20












          $begingroup$
          The execution time for these matrices is about 100 times slower than for my old code
          $endgroup$
          – Peiffap
          Dec 5 '18 at 13:21




          $begingroup$
          The execution time for these matrices is about 100 times slower than for my old code
          $endgroup$
          – Peiffap
          Dec 5 '18 at 13:21












          $begingroup$
          Which datatype do you use for A?
          $endgroup$
          – sehigle
          Dec 5 '18 at 13:25




          $begingroup$
          Which datatype do you use for A?
          $endgroup$
          – sehigle
          Dec 5 '18 at 13:25


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3026780%2fouter-product-reformulation-of-lu-decomposition%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Wiesbaden

          Marschland

          Dieringhausen