Custom mapping of categorical to numeric values












0















I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
The suggested solution of using dictionary mapping still has issues. Following is mcve



import pandas as pd
data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
'36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
df = pd.DataFrame(data,columns=['Age'],dtype=object)
df['Age']=df['Age'].astype('category')
cats = dict(enumerate(df['Age'].cat.categories, 2))
df['Age']=df['Age'].cat.codes.map(cats).astype('category')
df['Age']


and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:



df['Age']
0 NaN
1 36-41 Years
2 NaN
3 NaN
4 0-4 Years
5 13-18 Years
6 19-21 Years
7 22-25 Years
8 26-29 Years
9 30-35 Years
10 42-45 Years
11 46-49 Years
Name: Age, dtype: category
Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]


How to fix this.










share|improve this question





























    0















    I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
    The suggested solution of using dictionary mapping still has issues. Following is mcve



    import pandas as pd
    data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
    '36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
    df = pd.DataFrame(data,columns=['Age'],dtype=object)
    df['Age']=df['Age'].astype('category')
    cats = dict(enumerate(df['Age'].cat.categories, 2))
    df['Age']=df['Age'].cat.codes.map(cats).astype('category')
    df['Age']


    and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:



    df['Age']
    0 NaN
    1 36-41 Years
    2 NaN
    3 NaN
    4 0-4 Years
    5 13-18 Years
    6 19-21 Years
    7 22-25 Years
    8 26-29 Years
    9 30-35 Years
    10 42-45 Years
    11 46-49 Years
    Name: Age, dtype: category
    Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]


    How to fix this.










    share|improve this question



























      0












      0








      0








      I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
      The suggested solution of using dictionary mapping still has issues. Following is mcve



      import pandas as pd
      data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
      '36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
      df = pd.DataFrame(data,columns=['Age'],dtype=object)
      df['Age']=df['Age'].astype('category')
      cats = dict(enumerate(df['Age'].cat.categories, 2))
      df['Age']=df['Age'].cat.codes.map(cats).astype('category')
      df['Age']


      and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:



      df['Age']
      0 NaN
      1 36-41 Years
      2 NaN
      3 NaN
      4 0-4 Years
      5 13-18 Years
      6 19-21 Years
      7 22-25 Years
      8 26-29 Years
      9 30-35 Years
      10 42-45 Years
      11 46-49 Years
      Name: Age, dtype: category
      Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]


      How to fix this.










      share|improve this question
















      I have object type columns that have categorical values example 15-16 Years, 17-23 Years ..... I have converted them to category and then cat.codes. However,the coding values start from 0 for the first group 0-4 years and I want the coding to start from 1 ie 0-4->1, 5-12-> and @@->NaN
      The suggested solution of using dictionary mapping still has issues. Following is mcve



      import pandas as pd
      data = ['0-4 Years', '5-12 Years','13-18 Years', '19-21 Years','22-25 Years','26-29 Years','30-35 Years',
      '36-41 Years','42-45 Years','46-49 Years','50-55 Years', '56-63 Years']
      df = pd.DataFrame(data,columns=['Age'],dtype=object)
      df['Age']=df['Age'].astype('category')
      cats = dict(enumerate(df['Age'].cat.categories, 2))
      df['Age']=df['Age'].cat.codes.map(cats).astype('category')
      df['Age']


      and here is the output, as you can see if I change the enumeration start other than 0, there is an issue with values as NaN. Secondly the column is not coded either:



      df['Age']
      0 NaN
      1 36-41 Years
      2 NaN
      3 NaN
      4 0-4 Years
      5 13-18 Years
      6 19-21 Years
      7 22-25 Years
      8 26-29 Years
      9 30-35 Years
      10 42-45 Years
      11 46-49 Years
      Name: Age, dtype: category
      Categories (9, object): [0-4 Years, 13-18 Years, 19-21 Years, 22-25 Years, ..., 30-35 Years, 36-41 Years, 42-45 Years, 46-49 Years]


      How to fix this.







      python mapping






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 10:21







      aus_fas

















      asked Nov 22 '18 at 4:03









      aus_fasaus_fas

      207




      207
























          1 Answer
          1






          active

          oldest

          votes


















          0














          You can create you own dictionary that maps codes and categories with:



          cats = dict(enumerate(df['Age'].cat.categories, 1))


          And use this dictionary to map it in the dataframe



          df['Age'].cat.codes.map(cats).astype('category')





          share|improve this answer
























          • But what do you want to store in the column, the codes themselves?

            – b-fg
            Nov 22 '18 at 5:48











          • Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

            – aus_fas
            Nov 22 '18 at 6:08











          • The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

            – b-fg
            Nov 22 '18 at 6:32











          • Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

            – b-fg
            Nov 22 '18 at 6:43













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423715%2fcustom-mapping-of-categorical-to-numeric-values%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          You can create you own dictionary that maps codes and categories with:



          cats = dict(enumerate(df['Age'].cat.categories, 1))


          And use this dictionary to map it in the dataframe



          df['Age'].cat.codes.map(cats).astype('category')





          share|improve this answer
























          • But what do you want to store in the column, the codes themselves?

            – b-fg
            Nov 22 '18 at 5:48











          • Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

            – aus_fas
            Nov 22 '18 at 6:08











          • The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

            – b-fg
            Nov 22 '18 at 6:32











          • Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

            – b-fg
            Nov 22 '18 at 6:43


















          0














          You can create you own dictionary that maps codes and categories with:



          cats = dict(enumerate(df['Age'].cat.categories, 1))


          And use this dictionary to map it in the dataframe



          df['Age'].cat.codes.map(cats).astype('category')





          share|improve this answer
























          • But what do you want to store in the column, the codes themselves?

            – b-fg
            Nov 22 '18 at 5:48











          • Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

            – aus_fas
            Nov 22 '18 at 6:08











          • The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

            – b-fg
            Nov 22 '18 at 6:32











          • Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

            – b-fg
            Nov 22 '18 at 6:43
















          0












          0








          0







          You can create you own dictionary that maps codes and categories with:



          cats = dict(enumerate(df['Age'].cat.categories, 1))


          And use this dictionary to map it in the dataframe



          df['Age'].cat.codes.map(cats).astype('category')





          share|improve this answer













          You can create you own dictionary that maps codes and categories with:



          cats = dict(enumerate(df['Age'].cat.categories, 1))


          And use this dictionary to map it in the dataframe



          df['Age'].cat.codes.map(cats).astype('category')






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 4:16









          b-fgb-fg

          1,96411422




          1,96411422













          • But what do you want to store in the column, the codes themselves?

            – b-fg
            Nov 22 '18 at 5:48











          • Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

            – aus_fas
            Nov 22 '18 at 6:08











          • The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

            – b-fg
            Nov 22 '18 at 6:32











          • Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

            – b-fg
            Nov 22 '18 at 6:43





















          • But what do you want to store in the column, the codes themselves?

            – b-fg
            Nov 22 '18 at 5:48











          • Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

            – aus_fas
            Nov 22 '18 at 6:08











          • The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

            – b-fg
            Nov 22 '18 at 6:32











          • Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

            – b-fg
            Nov 22 '18 at 6:43



















          But what do you want to store in the column, the codes themselves?

          – b-fg
          Nov 22 '18 at 5:48





          But what do you want to store in the column, the codes themselves?

          – b-fg
          Nov 22 '18 at 5:48













          Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

          – aus_fas
          Nov 22 '18 at 6:08





          Also as I change the start of enumerate to any value higher than '1' then df['Age'] starts to have NaN for the categories where the mapping was available. The code upto cats is fine as I can see the categories based on dictionary but the second line seems to have an issue

          – aus_fas
          Nov 22 '18 at 6:08













          The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

          – b-fg
          Nov 22 '18 at 6:32





          The first line of code only creates a dictionary, so is not very useful on its own. That's why there a second line where you use the dictionary to map it to your dataframe.

          – b-fg
          Nov 22 '18 at 6:32













          Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

          – b-fg
          Nov 22 '18 at 6:43







          Maybe if you had provided a Minimum Complete and Verifiable Example this would not be the issue. So I encourage you to edit your question with some more content that other people can use to provide a better answer.

          – b-fg
          Nov 22 '18 at 6:43




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423715%2fcustom-mapping-of-categorical-to-numeric-values%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Wiesbaden

          Marschland

          Dieringhausen