drop group by number of occurrence












3















Hi I want to delete the rows with the entries whose number of occurrence is smaller than a number, for example:



df = pd.DataFrame({'a': [1,2,3,2], 'b':[4,5,6,7], 'c':[0,1,3,2]})
df


   a  b  c
0 1 4 0
1 2 5 1
2 3 6 3
3 2 7 2


Here I want to delete all the rows if the number of occurrence in column 'a' is less than twice.

Wanted output:



   a  b  c
1 2 5 1
3 2 7 2


What I know:
we can find the number of occurrence by condition = df['a'].value_counts() < 2, and it will give me something like:



2    False
3 True
1 True
Name: a, dtype: int64


But I don't know how I should approach from here to delete the rows.

Thanks in advance!










share|improve this question





























    3















    Hi I want to delete the rows with the entries whose number of occurrence is smaller than a number, for example:



    df = pd.DataFrame({'a': [1,2,3,2], 'b':[4,5,6,7], 'c':[0,1,3,2]})
    df


       a  b  c
    0 1 4 0
    1 2 5 1
    2 3 6 3
    3 2 7 2


    Here I want to delete all the rows if the number of occurrence in column 'a' is less than twice.

    Wanted output:



       a  b  c
    1 2 5 1
    3 2 7 2


    What I know:
    we can find the number of occurrence by condition = df['a'].value_counts() < 2, and it will give me something like:



    2    False
    3 True
    1 True
    Name: a, dtype: int64


    But I don't know how I should approach from here to delete the rows.

    Thanks in advance!










    share|improve this question



























      3












      3








      3








      Hi I want to delete the rows with the entries whose number of occurrence is smaller than a number, for example:



      df = pd.DataFrame({'a': [1,2,3,2], 'b':[4,5,6,7], 'c':[0,1,3,2]})
      df


         a  b  c
      0 1 4 0
      1 2 5 1
      2 3 6 3
      3 2 7 2


      Here I want to delete all the rows if the number of occurrence in column 'a' is less than twice.

      Wanted output:



         a  b  c
      1 2 5 1
      3 2 7 2


      What I know:
      we can find the number of occurrence by condition = df['a'].value_counts() < 2, and it will give me something like:



      2    False
      3 True
      1 True
      Name: a, dtype: int64


      But I don't know how I should approach from here to delete the rows.

      Thanks in advance!










      share|improve this question
















      Hi I want to delete the rows with the entries whose number of occurrence is smaller than a number, for example:



      df = pd.DataFrame({'a': [1,2,3,2], 'b':[4,5,6,7], 'c':[0,1,3,2]})
      df


         a  b  c
      0 1 4 0
      1 2 5 1
      2 3 6 3
      3 2 7 2


      Here I want to delete all the rows if the number of occurrence in column 'a' is less than twice.

      Wanted output:



         a  b  c
      1 2 5 1
      3 2 7 2


      What I know:
      we can find the number of occurrence by condition = df['a'].value_counts() < 2, and it will give me something like:



      2    False
      3 True
      1 True
      Name: a, dtype: int64


      But I don't know how I should approach from here to delete the rows.

      Thanks in advance!







      python pandas dataframe counter pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 25 '18 at 20:43









      jpp

      102k2165115




      102k2165115










      asked Nov 25 '18 at 20:06









      Louise FanLouise Fan

      183




      183
























          3 Answers
          3






          active

          oldest

          votes


















          2















          groupby + size



          res = df[df.groupby('a')['b'].transform('size') >= 2]


          The transform method maps df.groupby('a')['b'].size() to df aligned with df['a'].




          value_counts + map



          s = df['a'].value_counts()
          res = df[df['a'].map(s) >= 2]

          print(res)

          a b c
          1 2 5 1
          3 2 7 2





          share|improve this answer































            2














            You Can use df.where and the dropna



            df.where(df['a'].value_counts() <2).dropna()

            a b c
            1 2.0 5.0 1.0
            3 2.0 7.0 2.0





            share|improve this answer































              2














              You could try something like this to get the length of each group, transform back to original index and index the df by it



              df[df.groupby("a").transform(len)["b"] >= 2]


              a b c
              1 2 5 1
              3 2 7 2


              Breaking it into individual steps you get:



              df.groupby("a").transform(len)["b"]

              0 1
              1 2
              2 1
              3 2
              Name: b, dtype: int64


              These are the group sizes transformed back onto your original index



              df.groupby("a").transform(len)["b"] >=2

              0 False
              1 True
              2 False
              3 True
              Name: b, dtype: bool


              We then turn this into the boolean index and index our original dataframe by it






              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471422%2fdrop-group-by-number-of-occurrence%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2















                groupby + size



                res = df[df.groupby('a')['b'].transform('size') >= 2]


                The transform method maps df.groupby('a')['b'].size() to df aligned with df['a'].




                value_counts + map



                s = df['a'].value_counts()
                res = df[df['a'].map(s) >= 2]

                print(res)

                a b c
                1 2 5 1
                3 2 7 2





                share|improve this answer




























                  2















                  groupby + size



                  res = df[df.groupby('a')['b'].transform('size') >= 2]


                  The transform method maps df.groupby('a')['b'].size() to df aligned with df['a'].




                  value_counts + map



                  s = df['a'].value_counts()
                  res = df[df['a'].map(s) >= 2]

                  print(res)

                  a b c
                  1 2 5 1
                  3 2 7 2





                  share|improve this answer


























                    2












                    2








                    2








                    groupby + size



                    res = df[df.groupby('a')['b'].transform('size') >= 2]


                    The transform method maps df.groupby('a')['b'].size() to df aligned with df['a'].




                    value_counts + map



                    s = df['a'].value_counts()
                    res = df[df['a'].map(s) >= 2]

                    print(res)

                    a b c
                    1 2 5 1
                    3 2 7 2





                    share|improve this answer














                    groupby + size



                    res = df[df.groupby('a')['b'].transform('size') >= 2]


                    The transform method maps df.groupby('a')['b'].size() to df aligned with df['a'].




                    value_counts + map



                    s = df['a'].value_counts()
                    res = df[df['a'].map(s) >= 2]

                    print(res)

                    a b c
                    1 2 5 1
                    3 2 7 2






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 25 '18 at 20:26









                    jppjpp

                    102k2165115




                    102k2165115

























                        2














                        You Can use df.where and the dropna



                        df.where(df['a'].value_counts() <2).dropna()

                        a b c
                        1 2.0 5.0 1.0
                        3 2.0 7.0 2.0





                        share|improve this answer




























                          2














                          You Can use df.where and the dropna



                          df.where(df['a'].value_counts() <2).dropna()

                          a b c
                          1 2.0 5.0 1.0
                          3 2.0 7.0 2.0





                          share|improve this answer


























                            2












                            2








                            2







                            You Can use df.where and the dropna



                            df.where(df['a'].value_counts() <2).dropna()

                            a b c
                            1 2.0 5.0 1.0
                            3 2.0 7.0 2.0





                            share|improve this answer













                            You Can use df.where and the dropna



                            df.where(df['a'].value_counts() <2).dropna()

                            a b c
                            1 2.0 5.0 1.0
                            3 2.0 7.0 2.0






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 25 '18 at 20:14









                            Khalil Al HootiKhalil Al Hooti

                            1,2661820




                            1,2661820























                                2














                                You could try something like this to get the length of each group, transform back to original index and index the df by it



                                df[df.groupby("a").transform(len)["b"] >= 2]


                                a b c
                                1 2 5 1
                                3 2 7 2


                                Breaking it into individual steps you get:



                                df.groupby("a").transform(len)["b"]

                                0 1
                                1 2
                                2 1
                                3 2
                                Name: b, dtype: int64


                                These are the group sizes transformed back onto your original index



                                df.groupby("a").transform(len)["b"] >=2

                                0 False
                                1 True
                                2 False
                                3 True
                                Name: b, dtype: bool


                                We then turn this into the boolean index and index our original dataframe by it






                                share|improve this answer






























                                  2














                                  You could try something like this to get the length of each group, transform back to original index and index the df by it



                                  df[df.groupby("a").transform(len)["b"] >= 2]


                                  a b c
                                  1 2 5 1
                                  3 2 7 2


                                  Breaking it into individual steps you get:



                                  df.groupby("a").transform(len)["b"]

                                  0 1
                                  1 2
                                  2 1
                                  3 2
                                  Name: b, dtype: int64


                                  These are the group sizes transformed back onto your original index



                                  df.groupby("a").transform(len)["b"] >=2

                                  0 False
                                  1 True
                                  2 False
                                  3 True
                                  Name: b, dtype: bool


                                  We then turn this into the boolean index and index our original dataframe by it






                                  share|improve this answer




























                                    2












                                    2








                                    2







                                    You could try something like this to get the length of each group, transform back to original index and index the df by it



                                    df[df.groupby("a").transform(len)["b"] >= 2]


                                    a b c
                                    1 2 5 1
                                    3 2 7 2


                                    Breaking it into individual steps you get:



                                    df.groupby("a").transform(len)["b"]

                                    0 1
                                    1 2
                                    2 1
                                    3 2
                                    Name: b, dtype: int64


                                    These are the group sizes transformed back onto your original index



                                    df.groupby("a").transform(len)["b"] >=2

                                    0 False
                                    1 True
                                    2 False
                                    3 True
                                    Name: b, dtype: bool


                                    We then turn this into the boolean index and index our original dataframe by it






                                    share|improve this answer















                                    You could try something like this to get the length of each group, transform back to original index and index the df by it



                                    df[df.groupby("a").transform(len)["b"] >= 2]


                                    a b c
                                    1 2 5 1
                                    3 2 7 2


                                    Breaking it into individual steps you get:



                                    df.groupby("a").transform(len)["b"]

                                    0 1
                                    1 2
                                    2 1
                                    3 2
                                    Name: b, dtype: int64


                                    These are the group sizes transformed back onto your original index



                                    df.groupby("a").transform(len)["b"] >=2

                                    0 False
                                    1 True
                                    2 False
                                    3 True
                                    Name: b, dtype: bool


                                    We then turn this into the boolean index and index our original dataframe by it







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Nov 25 '18 at 20:20

























                                    answered Nov 25 '18 at 20:15









                                    Sven HarrisSven Harris

                                    2,1861516




                                    2,1861516






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471422%2fdrop-group-by-number-of-occurrence%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Tonle Sap (See)

                                        I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

                                        Guatemaltekische Davis-Cup-Mannschaft