stop groupby from making 2 combination same pair in python dataframe

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.

import pandas as pd



df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")

df = df[(df["is_super_over"] != 1)]

df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)

openners = df[(df["over"] == 1) & (df["ball"] == 1)]

openners = openners[["pri_key", "batsman", "non_striker"]]

openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})

df = pd.merge(df, openners, on="pri_key")

df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]

df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"])) 

    & ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

df1.nlargest(10)



Result:

batsman      non_striker

DA Warner    S Dhawan       1294

S Dhawan     DA Warner       823

RV Uthappa   G Gambhir       781

DR Smith     BB McCullum     684

CH Gayle     V Kohli         674

MEK Hussey   M Vijay         666

M Vijay      MEK Hussey      629

G Gambhir    RV Uthappa      611

BB McCullum  DR Smith        593

CH Gayle     TM Dilshan      537

and, I want to keep 1 pair as unique

for those who don't understand cricket
I have a dataframe

batsman    non_striker    runs

a              b            2

a              b            3

b              a            1

c              d            6

d              c            1

d              c            4

b              a            3

e              f            1

f              e            2

f              e            6



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

    df1.nlargest(30)



output:

batsman    non_striker    runs

  a            b            5

  b            a            4

  c            d            6

  d            c            5

  e            f            1

  f            e            8



expected output:

batsman    non_striker    runs

  a            b            9

  c            d            11

  e            f            9

what should I do? Please advise....

asked Nov 26 '18 at 19:15

Yash Mishra

264

Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27

add a comment |

import pandas as pd



df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")

df = df[(df["is_super_over"] != 1)]

df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)

openners = df[(df["over"] == 1) & (df["ball"] == 1)]

openners = openners[["pri_key", "batsman", "non_striker"]]

openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})

df = pd.merge(df, openners, on="pri_key")

df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]

df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"])) 

    & ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

df1.nlargest(10)



Result:

batsman      non_striker

DA Warner    S Dhawan       1294

S Dhawan     DA Warner       823

RV Uthappa   G Gambhir       781

DR Smith     BB McCullum     684

CH Gayle     V Kohli         674

MEK Hussey   M Vijay         666

M Vijay      MEK Hussey      629

G Gambhir    RV Uthappa      611

BB McCullum  DR Smith        593

CH Gayle     TM Dilshan      537

and, I want to keep 1 pair as unique

for those who don't understand cricket
I have a dataframe

batsman    non_striker    runs

a              b            2

a              b            3

b              a            1

c              d            6

d              c            1

d              c            4

b              a            3

e              f            1

f              e            2

f              e            6



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

    df1.nlargest(30)



output:

batsman    non_striker    runs

  a            b            5

  b            a            4

  c            d            6

  d            c            5

  e            f            1

  f            e            8



expected output:

batsman    non_striker    runs

  a            b            9

  c            d            11

  e            f            9

what should I do? Please advise....

asked Nov 26 '18 at 19:15

Yash Mishra

264

Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27

add a comment |

import pandas as pd



df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")

df = df[(df["is_super_over"] != 1)]

df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)

openners = df[(df["over"] == 1) & (df["ball"] == 1)]

openners = openners[["pri_key", "batsman", "non_striker"]]

openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})

df = pd.merge(df, openners, on="pri_key")

df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]

df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"])) 

    & ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

df1.nlargest(10)



Result:

batsman      non_striker

DA Warner    S Dhawan       1294

S Dhawan     DA Warner       823

RV Uthappa   G Gambhir       781

DR Smith     BB McCullum     684

CH Gayle     V Kohli         674

MEK Hussey   M Vijay         666

M Vijay      MEK Hussey      629

G Gambhir    RV Uthappa      611

BB McCullum  DR Smith        593

CH Gayle     TM Dilshan      537

and, I want to keep 1 pair as unique

for those who don't understand cricket
I have a dataframe

batsman    non_striker    runs

a              b            2

a              b            3

b              a            1

c              d            6

d              c            1

d              c            4

b              a            3

e              f            1

f              e            2

f              e            6



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

    df1.nlargest(30)



output:

batsman    non_striker    runs

  a            b            5

  b            a            4

  c            d            6

  d            c            5

  e            f            1

  f            e            8



expected output:

batsman    non_striker    runs

  a            b            9

  c            d            11

  e            f            9

what should I do? Please advise....

asked Nov 26 '18 at 19:15

Yash Mishra

264

import pandas as pd



df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")

df = df[(df["is_super_over"] != 1)]

df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)

openners = df[(df["over"] == 1) & (df["ball"] == 1)]

openners = openners[["pri_key", "batsman", "non_striker"]]

openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})

df = pd.merge(df, openners, on="pri_key")

df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]

df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"])) 

    & ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

df1.nlargest(10)



Result:

batsman      non_striker

DA Warner    S Dhawan       1294

S Dhawan     DA Warner       823

RV Uthappa   G Gambhir       781

DR Smith     BB McCullum     684

CH Gayle     V Kohli         674

MEK Hussey   M Vijay         666

M Vijay      MEK Hussey      629

G Gambhir    RV Uthappa      611

BB McCullum  DR Smith        593

CH Gayle     TM Dilshan      537

and, I want to keep 1 pair as unique

for those who don't understand cricket
I have a dataframe

batsman    non_striker    runs

a              b            2

a              b            3

b              a            1

c              d            6

d              c            1

d              c            4

b              a            3

e              f            1

f              e            2

f              e            6



df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")

    df1.nlargest(30)



output:

batsman    non_striker    runs

  a            b            5

  b            a            4

  c            d            6

  d            c            5

  e            f            1

  f            e            8



expected output:

batsman    non_striker    runs

  a            b            9

  c            d            11

  e            f            9

what should I do? Please advise....

python pandas

asked Nov 26 '18 at 19:15

Yash Mishra

264

asked Nov 26 '18 at 19:15

Yash Mishra

264

asked Nov 26 '18 at 19:15

Yash Mishra

264

asked Nov 26 '18 at 19:15

Yash Mishra

264

asked Nov 26 '18 at 19:15

Yash Mishra

264

Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27

add a comment |

Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27

Join df['batsman_non_striker'] = df['batsman'] + df['non_striker ']; sort the values ''.join(sorted(batsman_non_striker)), and then df.groupby('(batsman_non_striker').agg('sum')

– Ken Dekalb
Nov 26 '18 at 19:27

add a comment |

3 Answers
3

active

oldest

votes

You can sort the batsman and non_striker and then group the data

df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 

df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)

Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted

df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)

df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()

Either way, you will get,

batsman         non_striker

CH Gayle        V Kohli        2650

DA Warner       S Dhawan       2242

AB de Villiers  V Kohli        2135

G Gambhir       RV Uthappa     1795

M Vijay         MEK Hussey     1302

BB McCullum     DR Smith       1277

KA Pollard      RG Sharma      1220

MEK Hussey      SK Raina       1129

AT Rayudu       RG Sharma      1121

AM Rahane       SR Watson      1118

edited Nov 26 '18 at 20:42

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

1

Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

– Yash Mishra
Nov 26 '18 at 19:43

@YashMishra, just that I have worked on it quite a bit :)

– Vaishali
Nov 26 '18 at 19:44

@YashMishra, do remember to upvote/accept the answer if it answers your query fully

– Vaishali
Nov 26 '18 at 19:54

@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

– Vaishali
Dec 14 '18 at 18:14

add a comment |

Craete a new DataFrame using np.sort. Then groupby and sum.

import pandas as pd

import numpy as np



df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1), 

                   index=df.index,

                   columns=['player_1', 'player_2']).assign(runs = df.runs)



df1.groupby(['player_1', 'player_2']).runs.sum()

Output:

player_1  player_2

a         b            9

c         d           11

e         f            9

Name: runs, dtype: int64

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

add a comment |

I hope I understand you right...

What you can do is something like put the smaller value always in column A and the greater value always in column B.

import pandas as pd

import numpy as np



# generate example

values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']

df = pd.DataFrame()

df['batsman'] = np.random.choice(values, size=10)

df['no_striker'] = np.random.choice(values, size=10)



# column evaluation

df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])

df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])

answered Nov 26 '18 at 19:31

MisterMonk

1549

I really appreciate your effort and kindness to help but this is not exactly I am looking for.

– Yash Mishra
Nov 27 '18 at 17:08

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487627%2fstop-groupby-from-making-2-combination-same-pair-in-python-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You can sort the batsman and non_striker and then group the data

df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 

df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)

Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted

df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)

df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()

Either way, you will get,

batsman         non_striker

CH Gayle        V Kohli        2650

DA Warner       S Dhawan       2242

AB de Villiers  V Kohli        2135

G Gambhir       RV Uthappa     1795

M Vijay         MEK Hussey     1302

BB McCullum     DR Smith       1277

KA Pollard      RG Sharma      1220

MEK Hussey      SK Raina       1129

AT Rayudu       RG Sharma      1121

AM Rahane       SR Watson      1118

edited Nov 26 '18 at 20:42

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

1

Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

– Yash Mishra
Nov 26 '18 at 19:43

@YashMishra, just that I have worked on it quite a bit :)

– Vaishali
Nov 26 '18 at 19:44

@YashMishra, do remember to upvote/accept the answer if it answers your query fully

– Vaishali
Nov 26 '18 at 19:54

@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

– Vaishali
Dec 14 '18 at 18:14

add a comment |

You can sort the batsman and non_striker and then group the data

df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 

df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)

Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted

df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)

df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()

Either way, you will get,

batsman         non_striker

CH Gayle        V Kohli        2650

DA Warner       S Dhawan       2242

AB de Villiers  V Kohli        2135

G Gambhir       RV Uthappa     1795

M Vijay         MEK Hussey     1302

BB McCullum     DR Smith       1277

KA Pollard      RG Sharma      1220

MEK Hussey      SK Raina       1129

AT Rayudu       RG Sharma      1121

AM Rahane       SR Watson      1118

edited Nov 26 '18 at 20:42

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

1

Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

– Yash Mishra
Nov 26 '18 at 19:43

@YashMishra, just that I have worked on it quite a bit :)

– Vaishali
Nov 26 '18 at 19:44

@YashMishra, do remember to upvote/accept the answer if it answers your query fully

– Vaishali
Nov 26 '18 at 19:54

@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

– Vaishali
Dec 14 '18 at 18:14

add a comment |

You can sort the batsman and non_striker and then group the data

df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 

df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)

Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted

df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)

df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()

Either way, you will get,

batsman         non_striker

CH Gayle        V Kohli        2650

DA Warner       S Dhawan       2242

AB de Villiers  V Kohli        2135

G Gambhir       RV Uthappa     1795

M Vijay         MEK Hussey     1302

BB McCullum     DR Smith       1277

KA Pollard      RG Sharma      1220

MEK Hussey      SK Raina       1129

AT Rayudu       RG Sharma      1121

AM Rahane       SR Watson      1118

edited Nov 26 '18 at 20:42

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

You can sort the batsman and non_striker and then group the data

df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1) 

df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)

Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted

df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)

df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()

Either way, you will get,

batsman         non_striker

CH Gayle        V Kohli        2650

DA Warner       S Dhawan       2242

AB de Villiers  V Kohli        2135

G Gambhir       RV Uthappa     1795

M Vijay         MEK Hussey     1302

BB McCullum     DR Smith       1277

KA Pollard      RG Sharma      1220

MEK Hussey      SK Raina       1129

AT Rayudu       RG Sharma      1121

AM Rahane       SR Watson      1118

edited Nov 26 '18 at 20:42

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

edited Nov 26 '18 at 20:42

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

answered Nov 26 '18 at 19:28

Vaishali

22.6k41438

1

Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

– Yash Mishra
Nov 26 '18 at 19:43

@YashMishra, just that I have worked on it quite a bit :)

– Vaishali
Nov 26 '18 at 19:44

@YashMishra, do remember to upvote/accept the answer if it answers your query fully

– Vaishali
Nov 26 '18 at 19:54

@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

– Vaishali
Dec 14 '18 at 18:14

add a comment |

1

Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

– Yash Mishra
Nov 26 '18 at 19:43

@YashMishra, just that I have worked on it quite a bit :)

– Vaishali
Nov 26 '18 at 19:44

@YashMishra, do remember to upvote/accept the answer if it answers your query fully

– Vaishali
Nov 26 '18 at 19:54

@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

– Vaishali
Dec 14 '18 at 18:14

Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!

– Yash Mishra
Nov 26 '18 at 19:43

@YashMishra, just that I have worked on it quite a bit :)

– Vaishali
Nov 26 '18 at 19:44

@YashMishra, do remember to upvote/accept the answer if it answers your query fully

– Vaishali
Nov 26 '18 at 19:54

@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future

– Vaishali
Dec 14 '18 at 18:14

add a comment |

Craete a new DataFrame using np.sort. Then groupby and sum.

import pandas as pd

import numpy as np



df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1), 

                   index=df.index,

                   columns=['player_1', 'player_2']).assign(runs = df.runs)



df1.groupby(['player_1', 'player_2']).runs.sum()

Output:

player_1  player_2

a         b            9

c         d           11

e         f            9

Name: runs, dtype: int64

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

add a comment |

Craete a new DataFrame using np.sort. Then groupby and sum.

import pandas as pd

import numpy as np



df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1), 

                   index=df.index,

                   columns=['player_1', 'player_2']).assign(runs = df.runs)



df1.groupby(['player_1', 'player_2']).runs.sum()

Output:

player_1  player_2

a         b            9

c         d           11

e         f            9

Name: runs, dtype: int64

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

add a comment |

Craete a new DataFrame using np.sort. Then groupby and sum.

import pandas as pd

import numpy as np



df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1), 

                   index=df.index,

                   columns=['player_1', 'player_2']).assign(runs = df.runs)



df1.groupby(['player_1', 'player_2']).runs.sum()

Output:

player_1  player_2

a         b            9

c         d           11

e         f            9

Name: runs, dtype: int64

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

Craete a new DataFrame using np.sort. Then groupby and sum.

import pandas as pd

import numpy as np



df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1), 

                   index=df.index,

                   columns=['player_1', 'player_2']).assign(runs = df.runs)



df1.groupby(['player_1', 'player_2']).runs.sum()

Output:

player_1  player_2

a         b            9

c         d           11

e         f            9

Name: runs, dtype: int64

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

answered Nov 26 '18 at 19:28

ALollz

16.5k41838

add a comment |

I hope I understand you right...

What you can do is something like put the smaller value always in column A and the greater value always in column B.

import pandas as pd

import numpy as np



# generate example

values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']

df = pd.DataFrame()

df['batsman'] = np.random.choice(values, size=10)

df['no_striker'] = np.random.choice(values, size=10)



# column evaluation

df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])

df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])

answered Nov 26 '18 at 19:31

MisterMonk

1549

I really appreciate your effort and kindness to help but this is not exactly I am looking for.

– Yash Mishra
Nov 27 '18 at 17:08

add a comment |

I hope I understand you right...

What you can do is something like put the smaller value always in column A and the greater value always in column B.

import pandas as pd

import numpy as np



# generate example

values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']

df = pd.DataFrame()

df['batsman'] = np.random.choice(values, size=10)

df['no_striker'] = np.random.choice(values, size=10)



# column evaluation

df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])

df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])

answered Nov 26 '18 at 19:31

MisterMonk

1549

I really appreciate your effort and kindness to help but this is not exactly I am looking for.

– Yash Mishra
Nov 27 '18 at 17:08

add a comment |

I hope I understand you right...

What you can do is something like put the smaller value always in column A and the greater value always in column B.

import pandas as pd

import numpy as np



# generate example

values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']

df = pd.DataFrame()

df['batsman'] = np.random.choice(values, size=10)

df['no_striker'] = np.random.choice(values, size=10)



# column evaluation

df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])

df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])

answered Nov 26 '18 at 19:31

MisterMonk

1549

I hope I understand you right...

What you can do is something like put the smaller value always in column A and the greater value always in column B.

import pandas as pd

import numpy as np



# generate example

values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']

df = pd.DataFrame()

df['batsman'] = np.random.choice(values, size=10)

df['no_striker'] = np.random.choice(values, size=10)



# column evaluation

df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])

df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])

answered Nov 26 '18 at 19:31

MisterMonk

1549

answered Nov 26 '18 at 19:31

MisterMonk

1549

answered Nov 26 '18 at 19:31

MisterMonk

1549

answered Nov 26 '18 at 19:31

MisterMonk

1549

I really appreciate your effort and kindness to help but this is not exactly I am looking for.

– Yash Mishra
Nov 27 '18 at 17:08

add a comment |

I really appreciate your effort and kindness to help but this is not exactly I am looking for.

– Yash Mishra
Nov 27 '18 at 17:08

I really appreciate your effort and kindness to help but this is not exactly I am looking for.

– Yash Mishra
Nov 27 '18 at 17:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ustB05ux MNG,au3CB7bXR3

搜尋此網誌

Ytukyg