stop groupby from making 2 combination same pair in python dataframe
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.
import pandas as pd
df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)
Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537
and, I want to keep 1 pair as unique
for those who don't understand cricket
I have a dataframe
batsman non_striker runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)
output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8
expected output:
batsman non_striker runs
a b 9
c d 11
e f 9
what should I do? Please advise....
python pandas
add a comment |
I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.
import pandas as pd
df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)
Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537
and, I want to keep 1 pair as unique
for those who don't understand cricket
I have a dataframe
batsman non_striker runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)
output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8
expected output:
batsman non_striker runs
a b 9
c d 11
e f 9
what should I do? Please advise....
python pandas
Joindf['batsman_non_striker'] = df['batsman'] + df['non_striker ']
; sort the values''.join(sorted(batsman_non_striker))
, and then df.groupby('(batsman_non_striker').agg('sum')
– Ken Dekalb
Nov 26 '18 at 19:27
add a comment |
I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.
import pandas as pd
df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)
Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537
and, I want to keep 1 pair as unique
for those who don't understand cricket
I have a dataframe
batsman non_striker runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)
output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8
expected output:
batsman non_striker runs
a b 9
c d 11
e f 9
what should I do? Please advise....
python pandas
I am working on IPL dataset from Kaggle (https://www.kaggle.com/manasgarg/ipl).
I want to sum up the runs made by two people as pair and I have prepared my data.
When I am trying a GROUPBY on the dataframe columns (batsman and non_striker) it is making 2 combination of the same pair.
like (a,b) and (b,a) - rather I wish it should consider it as same.
As I can't drop rows any further.
import pandas as pd
df = pd.read_csv("C:\Users\Yash\AppData\Local\Programs\Python\Python36-32\Machine Learning\IPL\deliveries.csv")
df = df[(df["is_super_over"] != 1)]
df["pri_key"] = df["match_id"].astype(str) + "-" + df["inning"].astype(str)
openners = df[(df["over"] == 1) & (df["ball"] == 1)]
openners = openners[["pri_key", "batsman", "non_striker"]]
openners = openners.rename(columns = {"batsman":"batter1", "non_striker":"batter2"})
df = pd.merge(df, openners, on="pri_key")
df = df[["batsman", "non_striker", "batter1", "batter2", "batsman_runs"]]
df = df[((df["batsman"] == df["batter1"]) | (df["batsman"] == df["batter2"]))
& ((df["non_striker"] == df["batter1"]) | (df["non_striker"] == df["batter2"]))]
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(10)
Result:
batsman non_striker
DA Warner S Dhawan 1294
S Dhawan DA Warner 823
RV Uthappa G Gambhir 781
DR Smith BB McCullum 684
CH Gayle V Kohli 674
MEK Hussey M Vijay 666
M Vijay MEK Hussey 629
G Gambhir RV Uthappa 611
BB McCullum DR Smith 593
CH Gayle TM Dilshan 537
and, I want to keep 1 pair as unique
for those who don't understand cricket
I have a dataframe
batsman non_striker runs
a b 2
a b 3
b a 1
c d 6
d c 1
d c 4
b a 3
e f 1
f e 2
f e 6
df1 = df.groupby(["batsman" , "non_striker"], group_keys = False)["batsman_runs"].agg("sum")
df1.nlargest(30)
output:
batsman non_striker runs
a b 5
b a 4
c d 6
d c 5
e f 1
f e 8
expected output:
batsman non_striker runs
a b 9
c d 11
e f 9
what should I do? Please advise....
python pandas
python pandas
asked Nov 26 '18 at 19:15
Yash MishraYash Mishra
264
264
Joindf['batsman_non_striker'] = df['batsman'] + df['non_striker ']
; sort the values''.join(sorted(batsman_non_striker))
, and then df.groupby('(batsman_non_striker').agg('sum')
– Ken Dekalb
Nov 26 '18 at 19:27
add a comment |
Joindf['batsman_non_striker'] = df['batsman'] + df['non_striker ']
; sort the values''.join(sorted(batsman_non_striker))
, and then df.groupby('(batsman_non_striker').agg('sum')
– Ken Dekalb
Nov 26 '18 at 19:27
Join
df['batsman_non_striker'] = df['batsman'] + df['non_striker ']
; sort the values ''.join(sorted(batsman_non_striker))
, and then df.groupby('(batsman_non_striker').agg('sum')– Ken Dekalb
Nov 26 '18 at 19:27
Join
df['batsman_non_striker'] = df['batsman'] + df['non_striker ']
; sort the values ''.join(sorted(batsman_non_striker))
, and then df.groupby('(batsman_non_striker').agg('sum')– Ken Dekalb
Nov 26 '18 at 19:27
add a comment |
3 Answers
3
active
oldest
votes
You can sort the batsman and non_striker and then group the data
df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1)
df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)
Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted
df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()
Either way, you will get,
batsman non_striker
CH Gayle V Kohli 2650
DA Warner S Dhawan 2242
AB de Villiers V Kohli 2135
G Gambhir RV Uthappa 1795
M Vijay MEK Hussey 1302
BB McCullum DR Smith 1277
KA Pollard RG Sharma 1220
MEK Hussey SK Raina 1129
AT Rayudu RG Sharma 1121
AM Rahane SR Watson 1118
1
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
add a comment |
Craete a new DataFrame
using np.sort
. Then groupby and sum.
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
index=df.index,
columns=['player_1', 'player_2']).assign(runs = df.runs)
df1.groupby(['player_1', 'player_2']).runs.sum()
Output:
player_1 player_2
a b 9
c d 11
e f 9
Name: runs, dtype: int64
add a comment |
I hope I understand you right...
What you can do is something like put the smaller value always in column A and the greater value always in column B.
import pandas as pd
import numpy as np
# generate example
values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame()
df['batsman'] = np.random.choice(values, size=10)
df['no_striker'] = np.random.choice(values, size=10)
# column evaluation
df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487627%2fstop-groupby-from-making-2-combination-same-pair-in-python-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can sort the batsman and non_striker and then group the data
df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1)
df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)
Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted
df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()
Either way, you will get,
batsman non_striker
CH Gayle V Kohli 2650
DA Warner S Dhawan 2242
AB de Villiers V Kohli 2135
G Gambhir RV Uthappa 1795
M Vijay MEK Hussey 1302
BB McCullum DR Smith 1277
KA Pollard RG Sharma 1220
MEK Hussey SK Raina 1129
AT Rayudu RG Sharma 1121
AM Rahane SR Watson 1118
1
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
add a comment |
You can sort the batsman and non_striker and then group the data
df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1)
df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)
Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted
df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()
Either way, you will get,
batsman non_striker
CH Gayle V Kohli 2650
DA Warner S Dhawan 2242
AB de Villiers V Kohli 2135
G Gambhir RV Uthappa 1795
M Vijay MEK Hussey 1302
BB McCullum DR Smith 1277
KA Pollard RG Sharma 1220
MEK Hussey SK Raina 1129
AT Rayudu RG Sharma 1121
AM Rahane SR Watson 1118
1
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
add a comment |
You can sort the batsman and non_striker and then group the data
df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1)
df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)
Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted
df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()
Either way, you will get,
batsman non_striker
CH Gayle V Kohli 2650
DA Warner S Dhawan 2242
AB de Villiers V Kohli 2135
G Gambhir RV Uthappa 1795
M Vijay MEK Hussey 1302
BB McCullum DR Smith 1277
KA Pollard RG Sharma 1220
MEK Hussey SK Raina 1129
AT Rayudu RG Sharma 1121
AM Rahane SR Watson 1118
You can sort the batsman and non_striker and then group the data
df[['batsman', 'non_striker']] = df[['batsman', 'non_striker']].apply(sorted, axis=1)
df.groupby(['batsman', 'non_striker']).batsman_runs.sum().nlargest(10)
Edit: You can also use numpy for sorting the columns, which will be faster than using pandas sorted
df[['batsman', 'non_striker']] = np.sort(df[['batsman', 'non_striker']],1)
df.groupby(['batsman', 'non_striker'], sort = False).batsman_runs.sum().nlargest(10).sort_index()
Either way, you will get,
batsman non_striker
CH Gayle V Kohli 2650
DA Warner S Dhawan 2242
AB de Villiers V Kohli 2135
G Gambhir RV Uthappa 1795
M Vijay MEK Hussey 1302
BB McCullum DR Smith 1277
KA Pollard RG Sharma 1220
MEK Hussey SK Raina 1129
AT Rayudu RG Sharma 1121
AM Rahane SR Watson 1118
edited Nov 26 '18 at 20:42
answered Nov 26 '18 at 19:28
VaishaliVaishali
22.6k41438
22.6k41438
1
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
add a comment |
1
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
1
1
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
Perfect.... :) Thanks Vaishali... You are a true champ of IPL dataset!!!
– Yash Mishra
Nov 26 '18 at 19:43
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, just that I have worked on it quite a bit :)
– Vaishali
Nov 26 '18 at 19:44
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do remember to upvote/accept the answer if it answers your query fully
– Vaishali
Nov 26 '18 at 19:54
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
@YashMishra, do consider accepting the answer as it closes the question and would help anyone with a similar query in the future
– Vaishali
Dec 14 '18 at 18:14
add a comment |
Craete a new DataFrame
using np.sort
. Then groupby and sum.
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
index=df.index,
columns=['player_1', 'player_2']).assign(runs = df.runs)
df1.groupby(['player_1', 'player_2']).runs.sum()
Output:
player_1 player_2
a b 9
c d 11
e f 9
Name: runs, dtype: int64
add a comment |
Craete a new DataFrame
using np.sort
. Then groupby and sum.
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
index=df.index,
columns=['player_1', 'player_2']).assign(runs = df.runs)
df1.groupby(['player_1', 'player_2']).runs.sum()
Output:
player_1 player_2
a b 9
c d 11
e f 9
Name: runs, dtype: int64
add a comment |
Craete a new DataFrame
using np.sort
. Then groupby and sum.
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
index=df.index,
columns=['player_1', 'player_2']).assign(runs = df.runs)
df1.groupby(['player_1', 'player_2']).runs.sum()
Output:
player_1 player_2
a b 9
c d 11
e f 9
Name: runs, dtype: int64
Craete a new DataFrame
using np.sort
. Then groupby and sum.
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.sort(df[['batsman', 'non_striker']].values,1),
index=df.index,
columns=['player_1', 'player_2']).assign(runs = df.runs)
df1.groupby(['player_1', 'player_2']).runs.sum()
Output:
player_1 player_2
a b 9
c d 11
e f 9
Name: runs, dtype: int64
answered Nov 26 '18 at 19:28
ALollzALollz
16.5k41838
16.5k41838
add a comment |
add a comment |
I hope I understand you right...
What you can do is something like put the smaller value always in column A and the greater value always in column B.
import pandas as pd
import numpy as np
# generate example
values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame()
df['batsman'] = np.random.choice(values, size=10)
df['no_striker'] = np.random.choice(values, size=10)
# column evaluation
df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
add a comment |
I hope I understand you right...
What you can do is something like put the smaller value always in column A and the greater value always in column B.
import pandas as pd
import numpy as np
# generate example
values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame()
df['batsman'] = np.random.choice(values, size=10)
df['no_striker'] = np.random.choice(values, size=10)
# column evaluation
df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
add a comment |
I hope I understand you right...
What you can do is something like put the smaller value always in column A and the greater value always in column B.
import pandas as pd
import numpy as np
# generate example
values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame()
df['batsman'] = np.random.choice(values, size=10)
df['no_striker'] = np.random.choice(values, size=10)
# column evaluation
df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])
I hope I understand you right...
What you can do is something like put the smaller value always in column A and the greater value always in column B.
import pandas as pd
import numpy as np
# generate example
values = ['a', 'b' , 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame()
df['batsman'] = np.random.choice(values, size=10)
df['no_striker'] = np.random.choice(values, size=10)
# column evaluation
df['smaller'] = df['batsman'].where(df['batsman'] < df['no_striker'], df['no_striker'])
df['greater'] = df['batsman'].where(df['batsman'] > df['no_striker'], df['no_striker'])
answered Nov 26 '18 at 19:31
MisterMonkMisterMonk
1549
1549
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
add a comment |
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
I really appreciate your effort and kindness to help but this is not exactly I am looking for.
– Yash Mishra
Nov 27 '18 at 17:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487627%2fstop-groupby-from-making-2-combination-same-pair-in-python-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Join
df['batsman_non_striker'] = df['batsman'] + df['non_striker ']
; sort the values''.join(sorted(batsman_non_striker))
, and then df.groupby('(batsman_non_striker').agg('sum')– Ken Dekalb
Nov 26 '18 at 19:27