Custom mapping of categorical to numeric values
I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
The suggested solution of using dictionary mapping still has issues. Following is mcve
import pandas as pd
data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
'36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
df = pd.DataFrame(data,columns=['Age'],dtype=object)
df['Age']=df['Age'].astype('category')
cats = dict(enumerate(df['Age'].cat.categories, 2))
df['Age']=df['Age'].cat.codes.map(cats).astype('category')
df['Age']
and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:
df['Age']
0 NaN
1 36-41 Years
2 NaN
3 NaN
4 0-4 Years
5 13-18 Years
6 19-21 Years
7 22-25 Years
8 26-29 Years
9 30-35 Years
10 42-45 Years
11 46-49 Years
Name: Age, dtype: category
Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]
How to fix this.
python mapping
add a comment |
I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
The suggested solution of using dictionary mapping still has issues. Following is mcve
import pandas as pd
data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
'36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
df = pd.DataFrame(data,columns=['Age'],dtype=object)
df['Age']=df['Age'].astype('category')
cats = dict(enumerate(df['Age'].cat.categories, 2))
df['Age']=df['Age'].cat.codes.map(cats).astype('category')
df['Age']
and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:
df['Age']
0 NaN
1 36-41 Years
2 NaN
3 NaN
4 0-4 Years
5 13-18 Years
6 19-21 Years
7 22-25 Years
8 26-29 Years
9 30-35 Years
10 42-45 Years
11 46-49 Years
Name: Age, dtype: category
Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]
How to fix this.
python mapping
add a comment |
I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
The suggested solution of using dictionary mapping still has issues. Following is mcve
import pandas as pd
data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
'36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
df = pd.DataFrame(data,columns=['Age'],dtype=object)
df['Age']=df['Age'].astype('category')
cats = dict(enumerate(df['Age'].cat.categories, 2))
df['Age']=df['Age'].cat.codes.map(cats).astype('category')
df['Age']
and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:
df['Age']
0 NaN
1 36-41 Years
2 NaN
3 NaN
4 0-4 Years
5 13-18 Years
6 19-21 Years
7 22-25 Years
8 26-29 Years
9 30-35 Years
10 42-45 Years
11 46-49 Years
Name: Age, dtype: category
Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]
How to fix this.
python mapping
I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
The suggested solution of using dictionary mapping still has issues. Following is mcve
import pandas as pd
data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
'36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
df = pd.DataFrame(data,columns=['Age'],dtype=object)
df['Age']=df['Age'].astype('category')
cats = dict(enumerate(df['Age'].cat.categories, 2))
df['Age']=df['Age'].cat.codes.map(cats).astype('category')
df['Age']
and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:
df['Age']
0 NaN
1 36-41 Years
2 NaN
3 NaN
4 0-4 Years
5 13-18 Years
6 19-21 Years
7 22-25 Years
8 26-29 Years
9 30-35 Years
10 42-45 Years
11 46-49 Years
Name: Age, dtype: category
Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]
How to fix this.
python mapping
python mapping
edited Nov 22 '18 at 10:21
aus_fas
asked Nov 22 '18 at 4:03
aus_fasaus_fas
207
207
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You can create you own dictionary that maps codes and categories with:
cats = dict(enumerate(df['Age'].cat.categories, 1))
And use this dictionary to map it in the dataframe
df['Age'].cat.codes.map(cats).astype('category')
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423715%2fcustom-mapping-of-categorical-to-numeric-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can create you own dictionary that maps codes and categories with:
cats = dict(enumerate(df['Age'].cat.categories, 1))
And use this dictionary to map it in the dataframe
df['Age'].cat.codes.map(cats).astype('category')
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
add a comment |
You can create you own dictionary that maps codes and categories with:
cats = dict(enumerate(df['Age'].cat.categories, 1))
And use this dictionary to map it in the dataframe
df['Age'].cat.codes.map(cats).astype('category')
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
add a comment |
You can create you own dictionary that maps codes and categories with:
cats = dict(enumerate(df['Age'].cat.categories, 1))
And use this dictionary to map it in the dataframe
df['Age'].cat.codes.map(cats).astype('category')
You can create you own dictionary that maps codes and categories with:
cats = dict(enumerate(df['Age'].cat.categories, 1))
And use this dictionary to map it in the dataframe
df['Age'].cat.codes.map(cats).astype('category')
answered Nov 22 '18 at 4:16
b-fgb-fg
1,96411422
1,96411422
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
add a comment |
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
But what do you want to store in the column, the codes themselves?
– b-fg
Nov 22 '18 at 5:48
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue
– aus_fas
Nov 22 '18 at 6:08
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.
– b-fg
Nov 22 '18 at 6:32
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.
– b-fg
Nov 22 '18 at 6:43
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423715%2fcustom-mapping-of-categorical-to-numeric-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown