Simple 2-D Clustering Algorithm in Python





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















Being new to unsupervised methods I'm in need of a push in the right direction with some semi-simple code to run through some data as a case study. The data I'm working on only has 300 or so observations but I'm wanting to learn how I can apply clustering to very large sets as well that behave similarly.



I have a 2 feature set of data and I'd like to run an DBSCAN or something similar using Euclidean distances (if this is the correct clustering approach).



As an example the data looks like this:



enter image description here



I can just tell by eye that clustering this way might not be the best method as the distribution looks irregular.



What method should I use to begin understanding similar distributions like these - especially when the set is very large (100s of thousands of observations).










share|improve this question





























    1















    Being new to unsupervised methods I'm in need of a push in the right direction with some semi-simple code to run through some data as a case study. The data I'm working on only has 300 or so observations but I'm wanting to learn how I can apply clustering to very large sets as well that behave similarly.



    I have a 2 feature set of data and I'd like to run an DBSCAN or something similar using Euclidean distances (if this is the correct clustering approach).



    As an example the data looks like this:



    enter image description here



    I can just tell by eye that clustering this way might not be the best method as the distribution looks irregular.



    What method should I use to begin understanding similar distributions like these - especially when the set is very large (100s of thousands of observations).










    share|improve this question

























      1












      1








      1








      Being new to unsupervised methods I'm in need of a push in the right direction with some semi-simple code to run through some data as a case study. The data I'm working on only has 300 or so observations but I'm wanting to learn how I can apply clustering to very large sets as well that behave similarly.



      I have a 2 feature set of data and I'd like to run an DBSCAN or something similar using Euclidean distances (if this is the correct clustering approach).



      As an example the data looks like this:



      enter image description here



      I can just tell by eye that clustering this way might not be the best method as the distribution looks irregular.



      What method should I use to begin understanding similar distributions like these - especially when the set is very large (100s of thousands of observations).










      share|improve this question














      Being new to unsupervised methods I'm in need of a push in the right direction with some semi-simple code to run through some data as a case study. The data I'm working on only has 300 or so observations but I'm wanting to learn how I can apply clustering to very large sets as well that behave similarly.



      I have a 2 feature set of data and I'd like to run an DBSCAN or something similar using Euclidean distances (if this is the correct clustering approach).



      As an example the data looks like this:



      enter image description here



      I can just tell by eye that clustering this way might not be the best method as the distribution looks irregular.



      What method should I use to begin understanding similar distributions like these - especially when the set is very large (100s of thousands of observations).







      python python-3.x machine-learning scikit-learn scipy






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 26 '18 at 23:51









      HelloToEarthHelloToEarth

      532215




      532215
























          1 Answer
          1






          active

          oldest

          votes


















          3














          For most machine learning tasks, scikit-learn is your friend here. For DBSCAN, scikit-learn has sklearn.cluster.DBSCAN. From the scikit-learn docs:



          >>> from sklearn.cluster import DBSCAN
          >>> import numpy as np
          >>> X = np.array([[1, 2], [2, 2], [2, 3],
          ... [8, 7], [8, 8], [25, 80]])
          >>> clustering = DBSCAN(eps=3, min_samples=2).fit(X)
          >>> clustering.labels_
          array([ 0, 0, 0, 1, 1, -1])
          >>> clustering
          DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean',
          metric_params=None, min_samples=2, n_jobs=None, p=None)


          You also have other clustering algorithms available to you through scikit-learn. You can see all of them here.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490812%2fsimple-2-d-clustering-algorithm-in-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            For most machine learning tasks, scikit-learn is your friend here. For DBSCAN, scikit-learn has sklearn.cluster.DBSCAN. From the scikit-learn docs:



            >>> from sklearn.cluster import DBSCAN
            >>> import numpy as np
            >>> X = np.array([[1, 2], [2, 2], [2, 3],
            ... [8, 7], [8, 8], [25, 80]])
            >>> clustering = DBSCAN(eps=3, min_samples=2).fit(X)
            >>> clustering.labels_
            array([ 0, 0, 0, 1, 1, -1])
            >>> clustering
            DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean',
            metric_params=None, min_samples=2, n_jobs=None, p=None)


            You also have other clustering algorithms available to you through scikit-learn. You can see all of them here.






            share|improve this answer




























              3














              For most machine learning tasks, scikit-learn is your friend here. For DBSCAN, scikit-learn has sklearn.cluster.DBSCAN. From the scikit-learn docs:



              >>> from sklearn.cluster import DBSCAN
              >>> import numpy as np
              >>> X = np.array([[1, 2], [2, 2], [2, 3],
              ... [8, 7], [8, 8], [25, 80]])
              >>> clustering = DBSCAN(eps=3, min_samples=2).fit(X)
              >>> clustering.labels_
              array([ 0, 0, 0, 1, 1, -1])
              >>> clustering
              DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean',
              metric_params=None, min_samples=2, n_jobs=None, p=None)


              You also have other clustering algorithms available to you through scikit-learn. You can see all of them here.






              share|improve this answer


























                3












                3








                3







                For most machine learning tasks, scikit-learn is your friend here. For DBSCAN, scikit-learn has sklearn.cluster.DBSCAN. From the scikit-learn docs:



                >>> from sklearn.cluster import DBSCAN
                >>> import numpy as np
                >>> X = np.array([[1, 2], [2, 2], [2, 3],
                ... [8, 7], [8, 8], [25, 80]])
                >>> clustering = DBSCAN(eps=3, min_samples=2).fit(X)
                >>> clustering.labels_
                array([ 0, 0, 0, 1, 1, -1])
                >>> clustering
                DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean',
                metric_params=None, min_samples=2, n_jobs=None, p=None)


                You also have other clustering algorithms available to you through scikit-learn. You can see all of them here.






                share|improve this answer













                For most machine learning tasks, scikit-learn is your friend here. For DBSCAN, scikit-learn has sklearn.cluster.DBSCAN. From the scikit-learn docs:



                >>> from sklearn.cluster import DBSCAN
                >>> import numpy as np
                >>> X = np.array([[1, 2], [2, 2], [2, 3],
                ... [8, 7], [8, 8], [25, 80]])
                >>> clustering = DBSCAN(eps=3, min_samples=2).fit(X)
                >>> clustering.labels_
                array([ 0, 0, 0, 1, 1, -1])
                >>> clustering
                DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean',
                metric_params=None, min_samples=2, n_jobs=None, p=None)


                You also have other clustering algorithms available to you through scikit-learn. You can see all of them here.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 27 '18 at 0:00









                Tomothy32Tomothy32

                8,8382828




                8,8382828
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490812%2fsimple-2-d-clustering-algorithm-in-python%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Wiesbaden

                    Marschland

                    Dieringhausen