Pandas sorted ranking of column A sorted by column B
Currently I have the following python code
forumposts = pd.DataFrame({'UserId': [1,1,2,3,2,1,3], 'FirstPostDate': [2018,2018,2017,2019,2017,2018,2019], 'PostDate': [201801,201802,201701,201901,201801,201803,201902]})
data = forumposts.groupby(['UserId', 'PostDate','FirstPostDate']).size().reset_index()
rankedUserIdByFirstPostDate = data.groupby(['UserId', 'FirstPostDate']).size().reset_index().sort_values('FirstPostDate').reset_index(drop=True).reset_index()
data.loc[:,'Rank'] = data.merge(rankedUserIdByFirstPostDate , how='left', on='UserId')['index'].values
The code works as intended but its complicated is there a more pandas like way of doing this? The intent is the following:
Create a dense rank over the UserId
column sorted by the FirstPostDate
such that the user with the earliest posting gets rank 0 and the user with the second earliest first post gets rank 1 and so on.
Using forumposts.UserId.rank(method='dense')
gives me a ranking but its sorted by the order of the UserId.
python pandas pandas-groupby
add a comment |
Currently I have the following python code
forumposts = pd.DataFrame({'UserId': [1,1,2,3,2,1,3], 'FirstPostDate': [2018,2018,2017,2019,2017,2018,2019], 'PostDate': [201801,201802,201701,201901,201801,201803,201902]})
data = forumposts.groupby(['UserId', 'PostDate','FirstPostDate']).size().reset_index()
rankedUserIdByFirstPostDate = data.groupby(['UserId', 'FirstPostDate']).size().reset_index().sort_values('FirstPostDate').reset_index(drop=True).reset_index()
data.loc[:,'Rank'] = data.merge(rankedUserIdByFirstPostDate , how='left', on='UserId')['index'].values
The code works as intended but its complicated is there a more pandas like way of doing this? The intent is the following:
Create a dense rank over the UserId
column sorted by the FirstPostDate
such that the user with the earliest posting gets rank 0 and the user with the second earliest first post gets rank 1 and so on.
Using forumposts.UserId.rank(method='dense')
gives me a ranking but its sorted by the order of the UserId.
python pandas pandas-groupby
Can you add some sample data?
– jezrael
Nov 22 '18 at 8:06
@jezrael Done :) Output should rank user 2 first, user 1 second and user 3 third.
– CodeMonkey
Nov 22 '18 at 8:13
add a comment |
Currently I have the following python code
forumposts = pd.DataFrame({'UserId': [1,1,2,3,2,1,3], 'FirstPostDate': [2018,2018,2017,2019,2017,2018,2019], 'PostDate': [201801,201802,201701,201901,201801,201803,201902]})
data = forumposts.groupby(['UserId', 'PostDate','FirstPostDate']).size().reset_index()
rankedUserIdByFirstPostDate = data.groupby(['UserId', 'FirstPostDate']).size().reset_index().sort_values('FirstPostDate').reset_index(drop=True).reset_index()
data.loc[:,'Rank'] = data.merge(rankedUserIdByFirstPostDate , how='left', on='UserId')['index'].values
The code works as intended but its complicated is there a more pandas like way of doing this? The intent is the following:
Create a dense rank over the UserId
column sorted by the FirstPostDate
such that the user with the earliest posting gets rank 0 and the user with the second earliest first post gets rank 1 and so on.
Using forumposts.UserId.rank(method='dense')
gives me a ranking but its sorted by the order of the UserId.
python pandas pandas-groupby
Currently I have the following python code
forumposts = pd.DataFrame({'UserId': [1,1,2,3,2,1,3], 'FirstPostDate': [2018,2018,2017,2019,2017,2018,2019], 'PostDate': [201801,201802,201701,201901,201801,201803,201902]})
data = forumposts.groupby(['UserId', 'PostDate','FirstPostDate']).size().reset_index()
rankedUserIdByFirstPostDate = data.groupby(['UserId', 'FirstPostDate']).size().reset_index().sort_values('FirstPostDate').reset_index(drop=True).reset_index()
data.loc[:,'Rank'] = data.merge(rankedUserIdByFirstPostDate , how='left', on='UserId')['index'].values
The code works as intended but its complicated is there a more pandas like way of doing this? The intent is the following:
Create a dense rank over the UserId
column sorted by the FirstPostDate
such that the user with the earliest posting gets rank 0 and the user with the second earliest first post gets rank 1 and so on.
Using forumposts.UserId.rank(method='dense')
gives me a ranking but its sorted by the order of the UserId.
python pandas pandas-groupby
python pandas pandas-groupby
edited Nov 22 '18 at 8:12
CodeMonkey
asked Nov 22 '18 at 8:05
CodeMonkeyCodeMonkey
1,31211628
1,31211628
Can you add some sample data?
– jezrael
Nov 22 '18 at 8:06
@jezrael Done :) Output should rank user 2 first, user 1 second and user 3 third.
– CodeMonkey
Nov 22 '18 at 8:13
add a comment |
Can you add some sample data?
– jezrael
Nov 22 '18 at 8:06
@jezrael Done :) Output should rank user 2 first, user 1 second and user 3 third.
– CodeMonkey
Nov 22 '18 at 8:13
Can you add some sample data?
– jezrael
Nov 22 '18 at 8:06
Can you add some sample data?
– jezrael
Nov 22 '18 at 8:06
@jezrael Done :) Output should rank user 2 first, user 1 second and user 3 third.
– CodeMonkey
Nov 22 '18 at 8:13
@jezrael Done :) Output should rank user 2 first, user 1 second and user 3 third.
– CodeMonkey
Nov 22 '18 at 8:13
add a comment |
1 Answer
1
active
oldest
votes
Use map
by dictionary created by sort_values
with drop_duplicates
for order zipped with np.arange
:
data = (forumposts.groupby(['UserId', 'PostDate','FirstPostDate'])
.size()
.reset_index(name='count'))
users = data.sort_values('FirstPostDate').drop_duplicates('UserId')['UserId']
d = dict(zip(users, np.arange(len(users))))
data['Rank'] = data['UserId'].map(d)
print (data)
UserId PostDate FirstPostDate count Rank
0 1 201801 2018 1 1
1 1 201802 2018 1 1
2 1 201803 2018 1 1
3 2 201701 2017 1 0
4 2 201801 2017 1 0
5 3 201901 2019 1 2
6 3 201902 2019 1 2
Another solution:
data['Rank'] = (data.groupby('UserId')['FirstPostDate']
.transform('min')
.rank(method='dense')
.sub(1)
.astype(int))
1
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426350%2fpandas-sorted-ranking-of-column-a-sorted-by-column-b%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use map
by dictionary created by sort_values
with drop_duplicates
for order zipped with np.arange
:
data = (forumposts.groupby(['UserId', 'PostDate','FirstPostDate'])
.size()
.reset_index(name='count'))
users = data.sort_values('FirstPostDate').drop_duplicates('UserId')['UserId']
d = dict(zip(users, np.arange(len(users))))
data['Rank'] = data['UserId'].map(d)
print (data)
UserId PostDate FirstPostDate count Rank
0 1 201801 2018 1 1
1 1 201802 2018 1 1
2 1 201803 2018 1 1
3 2 201701 2017 1 0
4 2 201801 2017 1 0
5 3 201901 2019 1 2
6 3 201902 2019 1 2
Another solution:
data['Rank'] = (data.groupby('UserId')['FirstPostDate']
.transform('min')
.rank(method='dense')
.sub(1)
.astype(int))
1
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
add a comment |
Use map
by dictionary created by sort_values
with drop_duplicates
for order zipped with np.arange
:
data = (forumposts.groupby(['UserId', 'PostDate','FirstPostDate'])
.size()
.reset_index(name='count'))
users = data.sort_values('FirstPostDate').drop_duplicates('UserId')['UserId']
d = dict(zip(users, np.arange(len(users))))
data['Rank'] = data['UserId'].map(d)
print (data)
UserId PostDate FirstPostDate count Rank
0 1 201801 2018 1 1
1 1 201802 2018 1 1
2 1 201803 2018 1 1
3 2 201701 2017 1 0
4 2 201801 2017 1 0
5 3 201901 2019 1 2
6 3 201902 2019 1 2
Another solution:
data['Rank'] = (data.groupby('UserId')['FirstPostDate']
.transform('min')
.rank(method='dense')
.sub(1)
.astype(int))
1
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
add a comment |
Use map
by dictionary created by sort_values
with drop_duplicates
for order zipped with np.arange
:
data = (forumposts.groupby(['UserId', 'PostDate','FirstPostDate'])
.size()
.reset_index(name='count'))
users = data.sort_values('FirstPostDate').drop_duplicates('UserId')['UserId']
d = dict(zip(users, np.arange(len(users))))
data['Rank'] = data['UserId'].map(d)
print (data)
UserId PostDate FirstPostDate count Rank
0 1 201801 2018 1 1
1 1 201802 2018 1 1
2 1 201803 2018 1 1
3 2 201701 2017 1 0
4 2 201801 2017 1 0
5 3 201901 2019 1 2
6 3 201902 2019 1 2
Another solution:
data['Rank'] = (data.groupby('UserId')['FirstPostDate']
.transform('min')
.rank(method='dense')
.sub(1)
.astype(int))
Use map
by dictionary created by sort_values
with drop_duplicates
for order zipped with np.arange
:
data = (forumposts.groupby(['UserId', 'PostDate','FirstPostDate'])
.size()
.reset_index(name='count'))
users = data.sort_values('FirstPostDate').drop_duplicates('UserId')['UserId']
d = dict(zip(users, np.arange(len(users))))
data['Rank'] = data['UserId'].map(d)
print (data)
UserId PostDate FirstPostDate count Rank
0 1 201801 2018 1 1
1 1 201802 2018 1 1
2 1 201803 2018 1 1
3 2 201701 2017 1 0
4 2 201801 2017 1 0
5 3 201901 2019 1 2
6 3 201902 2019 1 2
Another solution:
data['Rank'] = (data.groupby('UserId')['FirstPostDate']
.transform('min')
.rank(method='dense')
.sub(1)
.astype(int))
edited Nov 22 '18 at 8:30
answered Nov 22 '18 at 8:21
jezraeljezrael
327k23270348
327k23270348
1
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
add a comment |
1
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
1
1
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
Great, learned something new, clever use of dict/zip and map!
– CodeMonkey
Nov 22 '18 at 8:43
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426350%2fpandas-sorted-ranking-of-column-a-sorted-by-column-b%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you add some sample data?
– jezrael
Nov 22 '18 at 8:06
@jezrael Done :) Output should rank user 2 first, user 1 second and user 3 third.
– CodeMonkey
Nov 22 '18 at 8:13