stop groupby from making 2 combination same pair in python dataframe





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







5















I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.



import pandas as pd

df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)

Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537


and, I want to keep 1 pair as unique



for those who don't understand cricket
I have a dataframe



batsman    non_striker    runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)

output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8

expected output:
batsman non_striker runs
a b 9
c d 11
e f 9


what should I do? Please advise....










share|improve this question























  • Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

    – Ken Dekalb
    Nov 26 '18 at 19:27


















5















I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.



import pandas as pd

df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)

Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537


and, I want to keep 1 pair as unique



for those who don't understand cricket
I have a dataframe



batsman    non_striker    runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)

output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8

expected output:
batsman non_striker runs
a b 9
c d 11
e f 9


what should I do? Please advise....










share|improve this question























  • Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

    – Ken Dekalb
    Nov 26 '18 at 19:27














5












5








5








I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.



import pandas as pd

df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)

Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537


and, I want to keep 1 pair as unique



for those who don't understand cricket
I have a dataframe



batsman    non_striker    runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)

output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8

expected output:
batsman non_striker runs
a b 9
c d 11
e f 9


what should I do? Please advise....










share|improve this question














I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.



import pandas as pd

df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)

Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537


and, I want to keep 1 pair as unique



for those who don't understand cricket
I have a dataframe



batsman    non_striker    runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6

df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)

output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8

expected output:
batsman non_striker runs
a b 9
c d 11
e f 9


what should I do? Please advise....







python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '18 at 19:15









Yash MishraYash Mishra

264




264













  • Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

    – Ken Dekalb
    Nov 26 '18 at 19:27



















  • Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

    – Ken Dekalb
    Nov 26 '18 at 19:27

















Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27





Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27












3 Answers
3






active

oldest

votes


















4














You can sort the batsman and non_striker and then group the data



df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 
df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)


Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted



df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()


Either way, you will get,



batsman         non_striker
CH Gayle V Kohli 2650
DA Warner S Dhawan 2242
AB de Villiers V Kohli 2135
G Gambhir RV Uthappa 1795
M Vijay MEK Hussey 1302
BB McCullum DR Smith 1277
KA Pollard RG Sharma 1220
MEK Hussey SK Raina 1129
AT Rayudu RG Sharma 1121
AM Rahane SR Watson 1118





share|improve this answer





















  • 1





    Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

    – Yash Mishra
    Nov 26 '18 at 19:43











  • @YashMishra, just that I have worked on it quite a bit :)

    – Vaishali
    Nov 26 '18 at 19:44











  • @YashMishra, do remember to upvote/accept the answer if it answers your query fully

    – Vaishali
    Nov 26 '18 at 19:54











  • @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

    – Vaishali
    Dec 14 '18 at 18:14



















1














Craete a new DataFrame using np.sort. Then groupby and sum.



import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
index=df.index,
columns=['player_1', 'player_2']).assign(runs = df.runs)

df1.groupby(['player_1', 'player_2']).runs.sum()


Output:



player_1  player_2
a b 9
c d 11
e f 9
Name: runs, dtype: int64





share|improve this answer































    0














    I hope I understand you right...



    What you can do is something like put the smaller value always in column A and the greater value always in column B.



    import pandas as pd
    import numpy as np

    # generate example
    values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
    df = pd.DataFrame()
    df['batsman'] = np.random.choice(values, size=10)
    df['no_striker'] = np.random.choice(values, size=10)

    # column evaluation
    df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
    df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])





    share|improve this answer
























    • I really appreciate your effort and kindness to help but this is not exactly I am looking for.

      – Yash Mishra
      Nov 27 '18 at 17:08












    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487627%2fstop-groupby-from-making-2-combination-same-pair-in-python-dataframe%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    You can sort the batsman and non_striker and then group the data



    df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 
    df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)


    Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted



    df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
    df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()


    Either way, you will get,



    batsman         non_striker
    CH Gayle V Kohli 2650
    DA Warner S Dhawan 2242
    AB de Villiers V Kohli 2135
    G Gambhir RV Uthappa 1795
    M Vijay MEK Hussey 1302
    BB McCullum DR Smith 1277
    KA Pollard RG Sharma 1220
    MEK Hussey SK Raina 1129
    AT Rayudu RG Sharma 1121
    AM Rahane SR Watson 1118





    share|improve this answer





















    • 1





      Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

      – Yash Mishra
      Nov 26 '18 at 19:43











    • @YashMishra, just that I have worked on it quite a bit :)

      – Vaishali
      Nov 26 '18 at 19:44











    • @YashMishra, do remember to upvote/accept the answer if it answers your query fully

      – Vaishali
      Nov 26 '18 at 19:54











    • @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

      – Vaishali
      Dec 14 '18 at 18:14
















    4














    You can sort the batsman and non_striker and then group the data



    df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 
    df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)


    Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted



    df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
    df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()


    Either way, you will get,



    batsman         non_striker
    CH Gayle V Kohli 2650
    DA Warner S Dhawan 2242
    AB de Villiers V Kohli 2135
    G Gambhir RV Uthappa 1795
    M Vijay MEK Hussey 1302
    BB McCullum DR Smith 1277
    KA Pollard RG Sharma 1220
    MEK Hussey SK Raina 1129
    AT Rayudu RG Sharma 1121
    AM Rahane SR Watson 1118





    share|improve this answer





















    • 1





      Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

      – Yash Mishra
      Nov 26 '18 at 19:43











    • @YashMishra, just that I have worked on it quite a bit :)

      – Vaishali
      Nov 26 '18 at 19:44











    • @YashMishra, do remember to upvote/accept the answer if it answers your query fully

      – Vaishali
      Nov 26 '18 at 19:54











    • @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

      – Vaishali
      Dec 14 '18 at 18:14














    4












    4








    4







    You can sort the batsman and non_striker and then group the data



    df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 
    df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)


    Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted



    df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
    df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()


    Either way, you will get,



    batsman         non_striker
    CH Gayle V Kohli 2650
    DA Warner S Dhawan 2242
    AB de Villiers V Kohli 2135
    G Gambhir RV Uthappa 1795
    M Vijay MEK Hussey 1302
    BB McCullum DR Smith 1277
    KA Pollard RG Sharma 1220
    MEK Hussey SK Raina 1129
    AT Rayudu RG Sharma 1121
    AM Rahane SR Watson 1118





    share|improve this answer















    You can sort the batsman and non_striker and then group the data



    df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 
    df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)


    Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted



    df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
    df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()


    Either way, you will get,



    batsman         non_striker
    CH Gayle V Kohli 2650
    DA Warner S Dhawan 2242
    AB de Villiers V Kohli 2135
    G Gambhir RV Uthappa 1795
    M Vijay MEK Hussey 1302
    BB McCullum DR Smith 1277
    KA Pollard RG Sharma 1220
    MEK Hussey SK Raina 1129
    AT Rayudu RG Sharma 1121
    AM Rahane SR Watson 1118






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 26 '18 at 20:42

























    answered Nov 26 '18 at 19:28









    VaishaliVaishali

    22.6k41438




    22.6k41438








    • 1





      Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

      – Yash Mishra
      Nov 26 '18 at 19:43











    • @YashMishra, just that I have worked on it quite a bit :)

      – Vaishali
      Nov 26 '18 at 19:44











    • @YashMishra, do remember to upvote/accept the answer if it answers your query fully

      – Vaishali
      Nov 26 '18 at 19:54











    • @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

      – Vaishali
      Dec 14 '18 at 18:14














    • 1





      Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

      – Yash Mishra
      Nov 26 '18 at 19:43











    • @YashMishra, just that I have worked on it quite a bit :)

      – Vaishali
      Nov 26 '18 at 19:44











    • @YashMishra, do remember to upvote/accept the answer if it answers your query fully

      – Vaishali
      Nov 26 '18 at 19:54











    • @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

      – Vaishali
      Dec 14 '18 at 18:14








    1




    1





    Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

    – Yash Mishra
    Nov 26 '18 at 19:43





    Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

    – Yash Mishra
    Nov 26 '18 at 19:43













    @YashMishra, just that I have worked on it quite a bit :)

    – Vaishali
    Nov 26 '18 at 19:44





    @YashMishra, just that I have worked on it quite a bit :)

    – Vaishali
    Nov 26 '18 at 19:44













    @YashMishra, do remember to upvote/accept the answer if it answers your query fully

    – Vaishali
    Nov 26 '18 at 19:54





    @YashMishra, do remember to upvote/accept the answer if it answers your query fully

    – Vaishali
    Nov 26 '18 at 19:54













    @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

    – Vaishali
    Dec 14 '18 at 18:14





    @YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

    – Vaishali
    Dec 14 '18 at 18:14













    1














    Craete a new DataFrame using np.sort. Then groupby and sum.



    import pandas as pd
    import numpy as np

    df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
    index=df.index,
    columns=['player_1', 'player_2']).assign(runs = df.runs)

    df1.groupby(['player_1', 'player_2']).runs.sum()


    Output:



    player_1  player_2
    a b 9
    c d 11
    e f 9
    Name: runs, dtype: int64





    share|improve this answer




























      1














      Craete a new DataFrame using np.sort. Then groupby and sum.



      import pandas as pd
      import numpy as np

      df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
      index=df.index,
      columns=['player_1', 'player_2']).assign(runs = df.runs)

      df1.groupby(['player_1', 'player_2']).runs.sum()


      Output:



      player_1  player_2
      a b 9
      c d 11
      e f 9
      Name: runs, dtype: int64





      share|improve this answer


























        1












        1








        1







        Craete a new DataFrame using np.sort. Then groupby and sum.



        import pandas as pd
        import numpy as np

        df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
        index=df.index,
        columns=['player_1', 'player_2']).assign(runs = df.runs)

        df1.groupby(['player_1', 'player_2']).runs.sum()


        Output:



        player_1  player_2
        a b 9
        c d 11
        e f 9
        Name: runs, dtype: int64





        share|improve this answer













        Craete a new DataFrame using np.sort. Then groupby and sum.



        import pandas as pd
        import numpy as np

        df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
        index=df.index,
        columns=['player_1', 'player_2']).assign(runs = df.runs)

        df1.groupby(['player_1', 'player_2']).runs.sum()


        Output:



        player_1  player_2
        a b 9
        c d 11
        e f 9
        Name: runs, dtype: int64






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 26 '18 at 19:28









        ALollzALollz

        16.5k41838




        16.5k41838























            0














            I hope I understand you right...



            What you can do is something like put the smaller value always in column A and the greater value always in column B.



            import pandas as pd
            import numpy as np

            # generate example
            values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
            df = pd.DataFrame()
            df['batsman'] = np.random.choice(values, size=10)
            df['no_striker'] = np.random.choice(values, size=10)

            # column evaluation
            df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
            df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])





            share|improve this answer
























            • I really appreciate your effort and kindness to help but this is not exactly I am looking for.

              – Yash Mishra
              Nov 27 '18 at 17:08
















            0














            I hope I understand you right...



            What you can do is something like put the smaller value always in column A and the greater value always in column B.



            import pandas as pd
            import numpy as np

            # generate example
            values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
            df = pd.DataFrame()
            df['batsman'] = np.random.choice(values, size=10)
            df['no_striker'] = np.random.choice(values, size=10)

            # column evaluation
            df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
            df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])





            share|improve this answer
























            • I really appreciate your effort and kindness to help but this is not exactly I am looking for.

              – Yash Mishra
              Nov 27 '18 at 17:08














            0












            0








            0







            I hope I understand you right...



            What you can do is something like put the smaller value always in column A and the greater value always in column B.



            import pandas as pd
            import numpy as np

            # generate example
            values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
            df = pd.DataFrame()
            df['batsman'] = np.random.choice(values, size=10)
            df['no_striker'] = np.random.choice(values, size=10)

            # column evaluation
            df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
            df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])





            share|improve this answer













            I hope I understand you right...



            What you can do is something like put the smaller value always in column A and the greater value always in column B.



            import pandas as pd
            import numpy as np

            # generate example
            values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
            df = pd.DataFrame()
            df['batsman'] = np.random.choice(values, size=10)
            df['no_striker'] = np.random.choice(values, size=10)

            # column evaluation
            df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
            df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 26 '18 at 19:31









            MisterMonkMisterMonk

            1549




            1549













            • I really appreciate your effort and kindness to help but this is not exactly I am looking for.

              – Yash Mishra
              Nov 27 '18 at 17:08



















            • I really appreciate your effort and kindness to help but this is not exactly I am looking for.

              – Yash Mishra
              Nov 27 '18 at 17:08

















            I really appreciate your effort and kindness to help but this is not exactly I am looking for.

            – Yash Mishra
            Nov 27 '18 at 17:08





            I really appreciate your effort and kindness to help but this is not exactly I am looking for.

            – Yash Mishra
            Nov 27 '18 at 17:08


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487627%2fstop-groupby-from-making-2-combination-same-pair-in-python-dataframe%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen