Improve speed of finding ending value of portfolio after spending fixed rate












1















I have a pd.DataFrame of return series corresponding to years with a fixed spending rate of 5%. I am looking to find the ending portfolio value after spending for each year. val_after_spending in year t is equal to the average of year t val_before_spending with year t-1 val_after_spending times the spending rate. For the first year, the val_after_spending in t-1 is assumed to be 1.



I right now have a working implementation (below), but it is incredibly slow. Is there a faster way to implement this?



import pandas as pd
import numpy as np
port_rets = pd.DataFrame({'port_ret': [.10,-.25,.15]})

spending_rate = .05

for index, row in port_rets.iterrows():
if index != 0:
port_rets.at[index, 'val_before_spending'] = port_rets['val_after_spending'][index - 1] * (1 + port_rets['port_ret'][index])
port_rets.at[index, 'spending'] = np.mean([port_rets['val_after_spending'][index - 1], port_rets['val_before_spending'][index]]) * spending_rate
else:
port_rets.at[index, 'val_before_spending'] = 1 * (1 + port_rets['port_ret'][index])
port_rets.at[index, 'spending'] = np.mean([1, port_rets['val_before_spending'][index]]) * spending_rate

port_rets.at[index, 'val_after_spending'] = port_rets['val_before_spending'][index] - port_rets['spending'][index]

# port_ret val_before_spending spending val_after_spending
#0 0.100000 1.100000 0.052500 1.047500
#1 -0.250000 0.785625 0.045828 0.739797
#2 0.150000 0.850766 0.039764 0.811002









share|improve this question



























    1















    I have a pd.DataFrame of return series corresponding to years with a fixed spending rate of 5%. I am looking to find the ending portfolio value after spending for each year. val_after_spending in year t is equal to the average of year t val_before_spending with year t-1 val_after_spending times the spending rate. For the first year, the val_after_spending in t-1 is assumed to be 1.



    I right now have a working implementation (below), but it is incredibly slow. Is there a faster way to implement this?



    import pandas as pd
    import numpy as np
    port_rets = pd.DataFrame({'port_ret': [.10,-.25,.15]})

    spending_rate = .05

    for index, row in port_rets.iterrows():
    if index != 0:
    port_rets.at[index, 'val_before_spending'] = port_rets['val_after_spending'][index - 1] * (1 + port_rets['port_ret'][index])
    port_rets.at[index, 'spending'] = np.mean([port_rets['val_after_spending'][index - 1], port_rets['val_before_spending'][index]]) * spending_rate
    else:
    port_rets.at[index, 'val_before_spending'] = 1 * (1 + port_rets['port_ret'][index])
    port_rets.at[index, 'spending'] = np.mean([1, port_rets['val_before_spending'][index]]) * spending_rate

    port_rets.at[index, 'val_after_spending'] = port_rets['val_before_spending'][index] - port_rets['spending'][index]

    # port_ret val_before_spending spending val_after_spending
    #0 0.100000 1.100000 0.052500 1.047500
    #1 -0.250000 0.785625 0.045828 0.739797
    #2 0.150000 0.850766 0.039764 0.811002









    share|improve this question

























      1












      1








      1








      I have a pd.DataFrame of return series corresponding to years with a fixed spending rate of 5%. I am looking to find the ending portfolio value after spending for each year. val_after_spending in year t is equal to the average of year t val_before_spending with year t-1 val_after_spending times the spending rate. For the first year, the val_after_spending in t-1 is assumed to be 1.



      I right now have a working implementation (below), but it is incredibly slow. Is there a faster way to implement this?



      import pandas as pd
      import numpy as np
      port_rets = pd.DataFrame({'port_ret': [.10,-.25,.15]})

      spending_rate = .05

      for index, row in port_rets.iterrows():
      if index != 0:
      port_rets.at[index, 'val_before_spending'] = port_rets['val_after_spending'][index - 1] * (1 + port_rets['port_ret'][index])
      port_rets.at[index, 'spending'] = np.mean([port_rets['val_after_spending'][index - 1], port_rets['val_before_spending'][index]]) * spending_rate
      else:
      port_rets.at[index, 'val_before_spending'] = 1 * (1 + port_rets['port_ret'][index])
      port_rets.at[index, 'spending'] = np.mean([1, port_rets['val_before_spending'][index]]) * spending_rate

      port_rets.at[index, 'val_after_spending'] = port_rets['val_before_spending'][index] - port_rets['spending'][index]

      # port_ret val_before_spending spending val_after_spending
      #0 0.100000 1.100000 0.052500 1.047500
      #1 -0.250000 0.785625 0.045828 0.739797
      #2 0.150000 0.850766 0.039764 0.811002









      share|improve this question














      I have a pd.DataFrame of return series corresponding to years with a fixed spending rate of 5%. I am looking to find the ending portfolio value after spending for each year. val_after_spending in year t is equal to the average of year t val_before_spending with year t-1 val_after_spending times the spending rate. For the first year, the val_after_spending in t-1 is assumed to be 1.



      I right now have a working implementation (below), but it is incredibly slow. Is there a faster way to implement this?



      import pandas as pd
      import numpy as np
      port_rets = pd.DataFrame({'port_ret': [.10,-.25,.15]})

      spending_rate = .05

      for index, row in port_rets.iterrows():
      if index != 0:
      port_rets.at[index, 'val_before_spending'] = port_rets['val_after_spending'][index - 1] * (1 + port_rets['port_ret'][index])
      port_rets.at[index, 'spending'] = np.mean([port_rets['val_after_spending'][index - 1], port_rets['val_before_spending'][index]]) * spending_rate
      else:
      port_rets.at[index, 'val_before_spending'] = 1 * (1 + port_rets['port_ret'][index])
      port_rets.at[index, 'spending'] = np.mean([1, port_rets['val_before_spending'][index]]) * spending_rate

      port_rets.at[index, 'val_after_spending'] = port_rets['val_before_spending'][index] - port_rets['spending'][index]

      # port_ret val_before_spending spending val_after_spending
      #0 0.100000 1.100000 0.052500 1.047500
      #1 -0.250000 0.785625 0.045828 0.739797
      #2 0.150000 0.850766 0.039764 0.811002






      python pandas performance finance






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 26 '18 at 8:03









      cpagecpage

      27212




      27212
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.



          We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :



          def get_vals(rates, spending_rate):
          n = len(rates)
          vals_after_spending = np.zeros((n+1, ))
          vals_before_spending = np.zeros((n+1, ))

          vals_after_spending[0] = 1.0

          for i in range(n):
          vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])

          spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
          vals_after_spending[i+1] = vals_before_spending[i+1] - spending

          return vals_before_spending[1:], vals_after_spending[1:]

          rates = np.array(port_rets["port_ret"].tolist())
          vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
          port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})


          We can further improve by JIT compiling the code, as python loops are slow.
          Below I use numba :



          import numba as nb
          @nb.njit(cache=True) # as easy as putting this decorator
          def get_vals(rates, spending_rate):
          n = len(rates)
          vals_after_spending = np.zeros((n+1, ))
          vals_before_spending = np.zeros((n+1, ))

          # ... code remains same, we are just compiling the function


          If we consider a random list of rates like this :



          port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})


          We get the performance comparisons:




          Your code : 15.758s



          get_vals : 1.407s



          JITed get_vals : 0.093s (on second run to discount for compile time)







          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53476857%2fimprove-speed-of-finding-ending-value-of-portfolio-after-spending-fixed-rate%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.



            We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :



            def get_vals(rates, spending_rate):
            n = len(rates)
            vals_after_spending = np.zeros((n+1, ))
            vals_before_spending = np.zeros((n+1, ))

            vals_after_spending[0] = 1.0

            for i in range(n):
            vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])

            spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
            vals_after_spending[i+1] = vals_before_spending[i+1] - spending

            return vals_before_spending[1:], vals_after_spending[1:]

            rates = np.array(port_rets["port_ret"].tolist())
            vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
            port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})


            We can further improve by JIT compiling the code, as python loops are slow.
            Below I use numba :



            import numba as nb
            @nb.njit(cache=True) # as easy as putting this decorator
            def get_vals(rates, spending_rate):
            n = len(rates)
            vals_after_spending = np.zeros((n+1, ))
            vals_before_spending = np.zeros((n+1, ))

            # ... code remains same, we are just compiling the function


            If we consider a random list of rates like this :



            port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})


            We get the performance comparisons:




            Your code : 15.758s



            get_vals : 1.407s



            JITed get_vals : 0.093s (on second run to discount for compile time)







            share|improve this answer




























              1














              You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.



              We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :



              def get_vals(rates, spending_rate):
              n = len(rates)
              vals_after_spending = np.zeros((n+1, ))
              vals_before_spending = np.zeros((n+1, ))

              vals_after_spending[0] = 1.0

              for i in range(n):
              vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])

              spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
              vals_after_spending[i+1] = vals_before_spending[i+1] - spending

              return vals_before_spending[1:], vals_after_spending[1:]

              rates = np.array(port_rets["port_ret"].tolist())
              vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
              port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})


              We can further improve by JIT compiling the code, as python loops are slow.
              Below I use numba :



              import numba as nb
              @nb.njit(cache=True) # as easy as putting this decorator
              def get_vals(rates, spending_rate):
              n = len(rates)
              vals_after_spending = np.zeros((n+1, ))
              vals_before_spending = np.zeros((n+1, ))

              # ... code remains same, we are just compiling the function


              If we consider a random list of rates like this :



              port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})


              We get the performance comparisons:




              Your code : 15.758s



              get_vals : 1.407s



              JITed get_vals : 0.093s (on second run to discount for compile time)







              share|improve this answer


























                1












                1








                1







                You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.



                We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :



                def get_vals(rates, spending_rate):
                n = len(rates)
                vals_after_spending = np.zeros((n+1, ))
                vals_before_spending = np.zeros((n+1, ))

                vals_after_spending[0] = 1.0

                for i in range(n):
                vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])

                spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
                vals_after_spending[i+1] = vals_before_spending[i+1] - spending

                return vals_before_spending[1:], vals_after_spending[1:]

                rates = np.array(port_rets["port_ret"].tolist())
                vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
                port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})


                We can further improve by JIT compiling the code, as python loops are slow.
                Below I use numba :



                import numba as nb
                @nb.njit(cache=True) # as easy as putting this decorator
                def get_vals(rates, spending_rate):
                n = len(rates)
                vals_after_spending = np.zeros((n+1, ))
                vals_before_spending = np.zeros((n+1, ))

                # ... code remains same, we are just compiling the function


                If we consider a random list of rates like this :



                port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})


                We get the performance comparisons:




                Your code : 15.758s



                get_vals : 1.407s



                JITed get_vals : 0.093s (on second run to discount for compile time)







                share|improve this answer













                You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.



                We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :



                def get_vals(rates, spending_rate):
                n = len(rates)
                vals_after_spending = np.zeros((n+1, ))
                vals_before_spending = np.zeros((n+1, ))

                vals_after_spending[0] = 1.0

                for i in range(n):
                vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])

                spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
                vals_after_spending[i+1] = vals_before_spending[i+1] - spending

                return vals_before_spending[1:], vals_after_spending[1:]

                rates = np.array(port_rets["port_ret"].tolist())
                vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
                port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})


                We can further improve by JIT compiling the code, as python loops are slow.
                Below I use numba :



                import numba as nb
                @nb.njit(cache=True) # as easy as putting this decorator
                def get_vals(rates, spending_rate):
                n = len(rates)
                vals_after_spending = np.zeros((n+1, ))
                vals_before_spending = np.zeros((n+1, ))

                # ... code remains same, we are just compiling the function


                If we consider a random list of rates like this :



                port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})


                We get the performance comparisons:




                Your code : 15.758s



                get_vals : 1.407s



                JITed get_vals : 0.093s (on second run to discount for compile time)








                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 26 '18 at 11:49









                Deepak SainiDeepak Saini

                1,619816




                1,619816
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53476857%2fimprove-speed-of-finding-ending-value-of-portfolio-after-spending-fixed-rate%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Tonle Sap (See)

                    I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

                    Guatemaltekische Davis-Cup-Mannschaft