Loss of rows when running Nested Loops in pandas dataframe

I am trying to replicate a pandas data frame, named df, by running a nested loop. I am simply filtering the data by unique values in each column and then appending the results together in an empty dataframe. The number of rows in df is 627974. The number of rows in df_new is 2275. Here is top 10 rows of the data

    cid     run_bal     last_tran_date  mark_seg    province    local_body  branch              acc_type       int_rate     

0   11276   495000.0    2013/1/4    Institutional   4   Municipality       KULESHWOR BRANCH     NORMAL SAVINGS  0.065   

1   11277   1030000.0   2013/1/2    Institutional   1   Municipality       CHABAHIL BRANCH      NORMAL SAVINGS  0.065   

2   11278   1225000.0   2013/1/20   Institutional   1   Municipality       TINKUNE BRANCH       NORMAL SAVINGS  0.065   

3   11278   50000.0     2013/1/8    Institutional   4   Municipality       Jorpati Branch       NORMAL SAVINGS  0.065   

4   11279   320000.0    2013/1/28   Institutional   3   Gaunpalika         TRIPURESHWAR BRANCH  NORMAL SAVINGS  0.065   

5   11281   100000.0    2013/1/14   Institutional   3   Municipality       KANTIPATH BRANCH     NORMAL SAVINGS  0.06    

6   11282   10000.0     2013/1/1    Institutional   5   Municipality       Gwarko               NORMAL SAVINGS  0.06    

7   11283   10000.0     2013/1/28   Institutional   1   Municipality       SWOYAMBHU BRANCH     CALL DEPOSIT    0.04    

8   11284   200000.0    2013/1/15   Institutional   3   Gaunpalika         Anamnagar Branch     NORMAL SAVINGS  0.065   

9   11284   545000.0    2013/1/21   Institutional   3   Municipality       Bouddha Branch       CALL DEPOSIT    0.04

Please find the code set below:

def bin_cus(df):

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f = df.loc[df.mark_seg == mk]

    prc_unq = df_f.province.unique()

    for prc in prc_unq:

        df_f = df_f.loc[df_f.province == prc]

        lb_unq = df_f.local_body.unique()

        for lb in lb_unq:

            df_f = df_f.loc[df_f.local_body == lb]

            brn_unq = df_f.branch.unique()

            for brn in brn_unq:

                df_f = df_f.loc[df_f.branch == brn]

                acc_unq = df_f.acc_type.unique()

                for acc in acc_unq:

                    df_f = df_f.loc[df_f.acc_type == acc]



                    df_new = pd.concat([df_new, df_f], axis = 0)

return df_new

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

asked Nov 24 '18 at 11:46

Rishi

184

add a comment |

    cid     run_bal     last_tran_date  mark_seg    province    local_body  branch              acc_type       int_rate     

0   11276   495000.0    2013/1/4    Institutional   4   Municipality       KULESHWOR BRANCH     NORMAL SAVINGS  0.065   

1   11277   1030000.0   2013/1/2    Institutional   1   Municipality       CHABAHIL BRANCH      NORMAL SAVINGS  0.065   

2   11278   1225000.0   2013/1/20   Institutional   1   Municipality       TINKUNE BRANCH       NORMAL SAVINGS  0.065   

3   11278   50000.0     2013/1/8    Institutional   4   Municipality       Jorpati Branch       NORMAL SAVINGS  0.065   

4   11279   320000.0    2013/1/28   Institutional   3   Gaunpalika         TRIPURESHWAR BRANCH  NORMAL SAVINGS  0.065   

5   11281   100000.0    2013/1/14   Institutional   3   Municipality       KANTIPATH BRANCH     NORMAL SAVINGS  0.06    

6   11282   10000.0     2013/1/1    Institutional   5   Municipality       Gwarko               NORMAL SAVINGS  0.06    

7   11283   10000.0     2013/1/28   Institutional   1   Municipality       SWOYAMBHU BRANCH     CALL DEPOSIT    0.04    

8   11284   200000.0    2013/1/15   Institutional   3   Gaunpalika         Anamnagar Branch     NORMAL SAVINGS  0.065   

9   11284   545000.0    2013/1/21   Institutional   3   Municipality       Bouddha Branch       CALL DEPOSIT    0.04

Please find the code set below:

def bin_cus(df):

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f = df.loc[df.mark_seg == mk]

    prc_unq = df_f.province.unique()

    for prc in prc_unq:

        df_f = df_f.loc[df_f.province == prc]

        lb_unq = df_f.local_body.unique()

        for lb in lb_unq:

            df_f = df_f.loc[df_f.local_body == lb]

            brn_unq = df_f.branch.unique()

            for brn in brn_unq:

                df_f = df_f.loc[df_f.branch == brn]

                acc_unq = df_f.acc_type.unique()

                for acc in acc_unq:

                    df_f = df_f.loc[df_f.acc_type == acc]



                    df_new = pd.concat([df_new, df_f], axis = 0)

return df_new

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

asked Nov 24 '18 at 11:46

Rishi

184

add a comment |

    cid     run_bal     last_tran_date  mark_seg    province    local_body  branch              acc_type       int_rate     

0   11276   495000.0    2013/1/4    Institutional   4   Municipality       KULESHWOR BRANCH     NORMAL SAVINGS  0.065   

1   11277   1030000.0   2013/1/2    Institutional   1   Municipality       CHABAHIL BRANCH      NORMAL SAVINGS  0.065   

2   11278   1225000.0   2013/1/20   Institutional   1   Municipality       TINKUNE BRANCH       NORMAL SAVINGS  0.065   

3   11278   50000.0     2013/1/8    Institutional   4   Municipality       Jorpati Branch       NORMAL SAVINGS  0.065   

4   11279   320000.0    2013/1/28   Institutional   3   Gaunpalika         TRIPURESHWAR BRANCH  NORMAL SAVINGS  0.065   

5   11281   100000.0    2013/1/14   Institutional   3   Municipality       KANTIPATH BRANCH     NORMAL SAVINGS  0.06    

6   11282   10000.0     2013/1/1    Institutional   5   Municipality       Gwarko               NORMAL SAVINGS  0.06    

7   11283   10000.0     2013/1/28   Institutional   1   Municipality       SWOYAMBHU BRANCH     CALL DEPOSIT    0.04    

8   11284   200000.0    2013/1/15   Institutional   3   Gaunpalika         Anamnagar Branch     NORMAL SAVINGS  0.065   

9   11284   545000.0    2013/1/21   Institutional   3   Municipality       Bouddha Branch       CALL DEPOSIT    0.04

Please find the code set below:

def bin_cus(df):

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f = df.loc[df.mark_seg == mk]

    prc_unq = df_f.province.unique()

    for prc in prc_unq:

        df_f = df_f.loc[df_f.province == prc]

        lb_unq = df_f.local_body.unique()

        for lb in lb_unq:

            df_f = df_f.loc[df_f.local_body == lb]

            brn_unq = df_f.branch.unique()

            for brn in brn_unq:

                df_f = df_f.loc[df_f.branch == brn]

                acc_unq = df_f.acc_type.unique()

                for acc in acc_unq:

                    df_f = df_f.loc[df_f.acc_type == acc]



                    df_new = pd.concat([df_new, df_f], axis = 0)

return df_new

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

asked Nov 24 '18 at 11:46

Rishi

184

    cid     run_bal     last_tran_date  mark_seg    province    local_body  branch              acc_type       int_rate     

0   11276   495000.0    2013/1/4    Institutional   4   Municipality       KULESHWOR BRANCH     NORMAL SAVINGS  0.065   

1   11277   1030000.0   2013/1/2    Institutional   1   Municipality       CHABAHIL BRANCH      NORMAL SAVINGS  0.065   

2   11278   1225000.0   2013/1/20   Institutional   1   Municipality       TINKUNE BRANCH       NORMAL SAVINGS  0.065   

3   11278   50000.0     2013/1/8    Institutional   4   Municipality       Jorpati Branch       NORMAL SAVINGS  0.065   

4   11279   320000.0    2013/1/28   Institutional   3   Gaunpalika         TRIPURESHWAR BRANCH  NORMAL SAVINGS  0.065   

5   11281   100000.0    2013/1/14   Institutional   3   Municipality       KANTIPATH BRANCH     NORMAL SAVINGS  0.06    

6   11282   10000.0     2013/1/1    Institutional   5   Municipality       Gwarko               NORMAL SAVINGS  0.06    

7   11283   10000.0     2013/1/28   Institutional   1   Municipality       SWOYAMBHU BRANCH     CALL DEPOSIT    0.04    

8   11284   200000.0    2013/1/15   Institutional   3   Gaunpalika         Anamnagar Branch     NORMAL SAVINGS  0.065   

9   11284   545000.0    2013/1/21   Institutional   3   Municipality       Bouddha Branch       CALL DEPOSIT    0.04

Please find the code set below:

def bin_cus(df):

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f = df.loc[df.mark_seg == mk]

    prc_unq = df_f.province.unique()

    for prc in prc_unq:

        df_f = df_f.loc[df_f.province == prc]

        lb_unq = df_f.local_body.unique()

        for lb in lb_unq:

            df_f = df_f.loc[df_f.local_body == lb]

            brn_unq = df_f.branch.unique()

            for brn in brn_unq:

                df_f = df_f.loc[df_f.branch == brn]

                acc_unq = df_f.acc_type.unique()

                for acc in acc_unq:

                    df_f = df_f.loc[df_f.acc_type == acc]



                    df_new = pd.concat([df_new, df_f], axis = 0)

return df_new

python-3.x pandas

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

asked Nov 24 '18 at 11:46

Rishi

184

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

asked Nov 24 '18 at 11:46

Rishi

184

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

edited Nov 24 '18 at 11:47

Srce Cde

1,184511

asked Nov 24 '18 at 11:46

Rishi

184

asked Nov 24 '18 at 11:46

Rishi

184

asked Nov 24 '18 at 11:46

Rishi

184

add a comment |

1 Answer
1

active

oldest

votes

You are making changes in the same dataframe inside each loop. If you rewrite the code as shown below it will fix your problem.

dep_cus_pandas_df = dep_cus_spark_df.toPandas()

dep_cus_pandas_df["cus_seg"] = ""

col_names = dep_cus_pandas_df.columns

df = dep_cus_pandas_df

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f_m = df.loc[df.mark_seg == mk]

    prc_unq = df_f_m.province.unique()

    for prc in prc_unq:

        df_f_p = df_f_m.loc[df_f_m.province == prc]

        lb_unq = df_f_p.local_body.unique()

        for lb in lb_unq:

            df_f_lb = df_f_p.loc[df_f_p.local_body == lb]

            brn_unq = df_f_lb.branch.unique()

            for brn in brn_unq:

                df_f_brn = df_f_lb.loc[df_f_lb.branch == brn]

                acc_unq = df_f_brn.acc_type.unique()

                for acc in acc_unq:

                    df_f_acc = df_f_brn.loc[df_f_brn.acc_type == acc]

                    df_f_acc.cus_seg = pd.cut(df_f_acc.run_bal.sort_values(), 4, labels= ["min_q1", "q1_q2", "q2_q3", "q3_max"],

                                               duplicates = "drop")

                    df_new = pd.concat([df_new, df_f_acc], axis = 0)



dep_cus_spark_df_new = sc_sql.createDataFrame(df_new, schema=["cid", "run_bal", "last_tran_date",

                                                               "mark_seg", "province","local_body",

                                                               "branch", "acc_type", "int_rate", "cus_seg"])

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53457791%2floss-of-rows-when-running-nested-loops-in-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You are making changes in the same dataframe inside each loop. If you rewrite the code as shown below it will fix your problem.

dep_cus_pandas_df = dep_cus_spark_df.toPandas()

dep_cus_pandas_df["cus_seg"] = ""

col_names = dep_cus_pandas_df.columns

df = dep_cus_pandas_df

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f_m = df.loc[df.mark_seg == mk]

    prc_unq = df_f_m.province.unique()

    for prc in prc_unq:

        df_f_p = df_f_m.loc[df_f_m.province == prc]

        lb_unq = df_f_p.local_body.unique()

        for lb in lb_unq:

            df_f_lb = df_f_p.loc[df_f_p.local_body == lb]

            brn_unq = df_f_lb.branch.unique()

            for brn in brn_unq:

                df_f_brn = df_f_lb.loc[df_f_lb.branch == brn]

                acc_unq = df_f_brn.acc_type.unique()

                for acc in acc_unq:

                    df_f_acc = df_f_brn.loc[df_f_brn.acc_type == acc]

                    df_f_acc.cus_seg = pd.cut(df_f_acc.run_bal.sort_values(), 4, labels= ["min_q1", "q1_q2", "q2_q3", "q3_max"],

                                               duplicates = "drop")

                    df_new = pd.concat([df_new, df_f_acc], axis = 0)



dep_cus_spark_df_new = sc_sql.createDataFrame(df_new, schema=["cid", "run_bal", "last_tran_date",

                                                               "mark_seg", "province","local_body",

                                                               "branch", "acc_type", "int_rate", "cus_seg"])

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

add a comment |

You are making changes in the same dataframe inside each loop. If you rewrite the code as shown below it will fix your problem.

dep_cus_pandas_df = dep_cus_spark_df.toPandas()

dep_cus_pandas_df["cus_seg"] = ""

col_names = dep_cus_pandas_df.columns

df = dep_cus_pandas_df

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f_m = df.loc[df.mark_seg == mk]

    prc_unq = df_f_m.province.unique()

    for prc in prc_unq:

        df_f_p = df_f_m.loc[df_f_m.province == prc]

        lb_unq = df_f_p.local_body.unique()

        for lb in lb_unq:

            df_f_lb = df_f_p.loc[df_f_p.local_body == lb]

            brn_unq = df_f_lb.branch.unique()

            for brn in brn_unq:

                df_f_brn = df_f_lb.loc[df_f_lb.branch == brn]

                acc_unq = df_f_brn.acc_type.unique()

                for acc in acc_unq:

                    df_f_acc = df_f_brn.loc[df_f_brn.acc_type == acc]

                    df_f_acc.cus_seg = pd.cut(df_f_acc.run_bal.sort_values(), 4, labels= ["min_q1", "q1_q2", "q2_q3", "q3_max"],

                                               duplicates = "drop")

                    df_new = pd.concat([df_new, df_f_acc], axis = 0)



dep_cus_spark_df_new = sc_sql.createDataFrame(df_new, schema=["cid", "run_bal", "last_tran_date",

                                                               "mark_seg", "province","local_body",

                                                               "branch", "acc_type", "int_rate", "cus_seg"])

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

add a comment |

You are making changes in the same dataframe inside each loop. If you rewrite the code as shown below it will fix your problem.

dep_cus_pandas_df = dep_cus_spark_df.toPandas()

dep_cus_pandas_df["cus_seg"] = ""

col_names = dep_cus_pandas_df.columns

df = dep_cus_pandas_df

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f_m = df.loc[df.mark_seg == mk]

    prc_unq = df_f_m.province.unique()

    for prc in prc_unq:

        df_f_p = df_f_m.loc[df_f_m.province == prc]

        lb_unq = df_f_p.local_body.unique()

        for lb in lb_unq:

            df_f_lb = df_f_p.loc[df_f_p.local_body == lb]

            brn_unq = df_f_lb.branch.unique()

            for brn in brn_unq:

                df_f_brn = df_f_lb.loc[df_f_lb.branch == brn]

                acc_unq = df_f_brn.acc_type.unique()

                for acc in acc_unq:

                    df_f_acc = df_f_brn.loc[df_f_brn.acc_type == acc]

                    df_f_acc.cus_seg = pd.cut(df_f_acc.run_bal.sort_values(), 4, labels= ["min_q1", "q1_q2", "q2_q3", "q3_max"],

                                               duplicates = "drop")

                    df_new = pd.concat([df_new, df_f_acc], axis = 0)



dep_cus_spark_df_new = sc_sql.createDataFrame(df_new, schema=["cid", "run_bal", "last_tran_date",

                                                               "mark_seg", "province","local_body",

                                                               "branch", "acc_type", "int_rate", "cus_seg"])

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

You are making changes in the same dataframe inside each loop. If you rewrite the code as shown below it will fix your problem.

dep_cus_pandas_df = dep_cus_spark_df.toPandas()

dep_cus_pandas_df["cus_seg"] = ""

col_names = dep_cus_pandas_df.columns

df = dep_cus_pandas_df

df_new = pd.DataFrame(columns= col_names)

mk_unq = df.mark_seg.unique()

for mk in mk_unq:

    df_f_m = df.loc[df.mark_seg == mk]

    prc_unq = df_f_m.province.unique()

    for prc in prc_unq:

        df_f_p = df_f_m.loc[df_f_m.province == prc]

        lb_unq = df_f_p.local_body.unique()

        for lb in lb_unq:

            df_f_lb = df_f_p.loc[df_f_p.local_body == lb]

            brn_unq = df_f_lb.branch.unique()

            for brn in brn_unq:

                df_f_brn = df_f_lb.loc[df_f_lb.branch == brn]

                acc_unq = df_f_brn.acc_type.unique()

                for acc in acc_unq:

                    df_f_acc = df_f_brn.loc[df_f_brn.acc_type == acc]

                    df_f_acc.cus_seg = pd.cut(df_f_acc.run_bal.sort_values(), 4, labels= ["min_q1", "q1_q2", "q2_q3", "q3_max"],

                                               duplicates = "drop")

                    df_new = pd.concat([df_new, df_f_acc], axis = 0)



dep_cus_spark_df_new = sc_sql.createDataFrame(df_new, schema=["cid", "run_bal", "last_tran_date",

                                                               "mark_seg", "province","local_body",

                                                               "branch", "acc_type", "int_rate", "cus_seg"])

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

answered Nov 29 '18 at 3:19

Dipesh Poudel

38947

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg