how to solve this changing dataframe problem












2















let say i have a dataframe that consist of these two columns.



User_id hotel_cluster 
1 0
2 2
3 2
3 3
3 0
4 2


i want to change it into something like this. Do i need to write a function or is there a pandas way to do it?



User_id hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
1 1 0 0 0
2 0 0 1 0
3 1 0 1 1
4 0 0 1 0


Please help! Sorry if i am not posting the question in the right format
Thank you!










share|improve this question





























    2















    let say i have a dataframe that consist of these two columns.



    User_id hotel_cluster 
    1 0
    2 2
    3 2
    3 3
    3 0
    4 2


    i want to change it into something like this. Do i need to write a function or is there a pandas way to do it?



    User_id hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
    1 1 0 0 0
    2 0 0 1 0
    3 1 0 1 1
    4 0 0 1 0


    Please help! Sorry if i am not posting the question in the right format
    Thank you!










    share|improve this question



























      2












      2








      2


      1






      let say i have a dataframe that consist of these two columns.



      User_id hotel_cluster 
      1 0
      2 2
      3 2
      3 3
      3 0
      4 2


      i want to change it into something like this. Do i need to write a function or is there a pandas way to do it?



      User_id hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
      1 1 0 0 0
      2 0 0 1 0
      3 1 0 1 1
      4 0 0 1 0


      Please help! Sorry if i am not posting the question in the right format
      Thank you!










      share|improve this question
















      let say i have a dataframe that consist of these two columns.



      User_id hotel_cluster 
      1 0
      2 2
      3 2
      3 3
      3 0
      4 2


      i want to change it into something like this. Do i need to write a function or is there a pandas way to do it?



      User_id hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
      1 1 0 0 0
      2 0 0 1 0
      3 1 0 1 1
      4 0 0 1 0


      Please help! Sorry if i am not posting the question in the right format
      Thank you!







      python pandas dataframe






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 '18 at 18:08









      yatu

      6,2031726




      6,2031726










      asked Nov 21 '18 at 18:02









      Simon LimSimon Lim

      303




      303
























          2 Answers
          2






          active

          oldest

          votes


















          2














          SEE ALSO





          IIUC:



          Option 1



          First change 'hotel_cluster' to a categorical that includes categories that don't exist



          col = 'hotel_cluster'
          df[col] = pd.Categorical(df[col], categories=[0, 1, 2, 3])
          pd.crosstab(*map(df.get, df)).add_prefix(f"{col}_")

          hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
          User_id
          1 1 0 0 0
          2 0 0 1 0
          3 1 0 1 1
          4 0 0 1 0




          Option 2



          Reindex after crosstab



          pd.crosstab(*map(df.get, df)).reindex(
          columns=range(4), fill_value=0
          ).add_prefix('hotel_cluster_')

          hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
          User_id
          1 1 0 0 0
          2 0 0 1 0
          3 1 0 1 1
          4 0 0 1 0





          share|improve this answer































            1














            A simple way if you do not need the non-appearing values is to use pd.get_dummies:



            pd.get_dummies(df.hotel_cluster, prefix = 'hotel_cluster').set_index(df.User_id)


            Otherwise you want something like @piRSquared's solution.






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418077%2fhow-to-solve-this-changing-dataframe-problem%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              SEE ALSO





              IIUC:



              Option 1



              First change 'hotel_cluster' to a categorical that includes categories that don't exist



              col = 'hotel_cluster'
              df[col] = pd.Categorical(df[col], categories=[0, 1, 2, 3])
              pd.crosstab(*map(df.get, df)).add_prefix(f"{col}_")

              hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
              User_id
              1 1 0 0 0
              2 0 0 1 0
              3 1 0 1 1
              4 0 0 1 0




              Option 2



              Reindex after crosstab



              pd.crosstab(*map(df.get, df)).reindex(
              columns=range(4), fill_value=0
              ).add_prefix('hotel_cluster_')

              hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
              User_id
              1 1 0 0 0
              2 0 0 1 0
              3 1 0 1 1
              4 0 0 1 0





              share|improve this answer




























                2














                SEE ALSO





                IIUC:



                Option 1



                First change 'hotel_cluster' to a categorical that includes categories that don't exist



                col = 'hotel_cluster'
                df[col] = pd.Categorical(df[col], categories=[0, 1, 2, 3])
                pd.crosstab(*map(df.get, df)).add_prefix(f"{col}_")

                hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
                User_id
                1 1 0 0 0
                2 0 0 1 0
                3 1 0 1 1
                4 0 0 1 0




                Option 2



                Reindex after crosstab



                pd.crosstab(*map(df.get, df)).reindex(
                columns=range(4), fill_value=0
                ).add_prefix('hotel_cluster_')

                hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
                User_id
                1 1 0 0 0
                2 0 0 1 0
                3 1 0 1 1
                4 0 0 1 0





                share|improve this answer


























                  2












                  2








                  2







                  SEE ALSO





                  IIUC:



                  Option 1



                  First change 'hotel_cluster' to a categorical that includes categories that don't exist



                  col = 'hotel_cluster'
                  df[col] = pd.Categorical(df[col], categories=[0, 1, 2, 3])
                  pd.crosstab(*map(df.get, df)).add_prefix(f"{col}_")

                  hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
                  User_id
                  1 1 0 0 0
                  2 0 0 1 0
                  3 1 0 1 1
                  4 0 0 1 0




                  Option 2



                  Reindex after crosstab



                  pd.crosstab(*map(df.get, df)).reindex(
                  columns=range(4), fill_value=0
                  ).add_prefix('hotel_cluster_')

                  hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
                  User_id
                  1 1 0 0 0
                  2 0 0 1 0
                  3 1 0 1 1
                  4 0 0 1 0





                  share|improve this answer













                  SEE ALSO





                  IIUC:



                  Option 1



                  First change 'hotel_cluster' to a categorical that includes categories that don't exist



                  col = 'hotel_cluster'
                  df[col] = pd.Categorical(df[col], categories=[0, 1, 2, 3])
                  pd.crosstab(*map(df.get, df)).add_prefix(f"{col}_")

                  hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
                  User_id
                  1 1 0 0 0
                  2 0 0 1 0
                  3 1 0 1 1
                  4 0 0 1 0




                  Option 2



                  Reindex after crosstab



                  pd.crosstab(*map(df.get, df)).reindex(
                  columns=range(4), fill_value=0
                  ).add_prefix('hotel_cluster_')

                  hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
                  User_id
                  1 1 0 0 0
                  2 0 0 1 0
                  3 1 0 1 1
                  4 0 0 1 0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 '18 at 18:08









                  piRSquaredpiRSquared

                  152k22144287




                  152k22144287

























                      1














                      A simple way if you do not need the non-appearing values is to use pd.get_dummies:



                      pd.get_dummies(df.hotel_cluster, prefix = 'hotel_cluster').set_index(df.User_id)


                      Otherwise you want something like @piRSquared's solution.






                      share|improve this answer




























                        1














                        A simple way if you do not need the non-appearing values is to use pd.get_dummies:



                        pd.get_dummies(df.hotel_cluster, prefix = 'hotel_cluster').set_index(df.User_id)


                        Otherwise you want something like @piRSquared's solution.






                        share|improve this answer


























                          1












                          1








                          1







                          A simple way if you do not need the non-appearing values is to use pd.get_dummies:



                          pd.get_dummies(df.hotel_cluster, prefix = 'hotel_cluster').set_index(df.User_id)


                          Otherwise you want something like @piRSquared's solution.






                          share|improve this answer













                          A simple way if you do not need the non-appearing values is to use pd.get_dummies:



                          pd.get_dummies(df.hotel_cluster, prefix = 'hotel_cluster').set_index(df.User_id)


                          Otherwise you want something like @piRSquared's solution.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 21 '18 at 18:14









                          yatuyatu

                          6,2031726




                          6,2031726






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418077%2fhow-to-solve-this-changing-dataframe-problem%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Wiesbaden

                              Marschland

                              Dieringhausen