Group GPS points with Pandas

I have a Pandas dataframe of towers, like:

site       lat      lon

18ALOP01   11.1278  14.3578

18ALOP02   11.1278  14.3578

18ALOP12   11.1288  14.3575

18PENO01   11.1580  14.2898

And I need to group them if they are too close (50m). Then, I made a script that performs a "self cross join", calculates the distance between the combinations of all sites and set the same id for those where the distance is less than a threshold. So, if I have n sites, it will calculate (n^2) - n combinations, then, it is a poor algorithm. Is there a better way of doing that?

edited Nov 23 '18 at 19:02

asked Nov 23 '18 at 17:42

Krogiar

275

what does your expected output look like?

– Ken Dekalb
Nov 23 '18 at 18:57

What happens if point a is 40m from point b, 80m from point c, and point b is 45m from point c? Are they all in the same group?

– andersource
Nov 23 '18 at 19:07

Yes, no problem.

– Krogiar
Nov 23 '18 at 19:27

add a comment |

I have a Pandas dataframe of towers, like:

site       lat      lon

18ALOP01   11.1278  14.3578

18ALOP02   11.1278  14.3578

18ALOP12   11.1288  14.3575

18PENO01   11.1580  14.2898

edited Nov 23 '18 at 19:02

asked Nov 23 '18 at 17:42

Krogiar

275

what does your expected output look like?

– Ken Dekalb
Nov 23 '18 at 18:57

What happens if point a is 40m from point b, 80m from point c, and point b is 45m from point c? Are they all in the same group?

– andersource
Nov 23 '18 at 19:07

Yes, no problem.

– Krogiar
Nov 23 '18 at 19:27

add a comment |

I have a Pandas dataframe of towers, like:

site       lat      lon

18ALOP01   11.1278  14.3578

18ALOP02   11.1278  14.3578

18ALOP12   11.1288  14.3575

18PENO01   11.1580  14.2898

edited Nov 23 '18 at 19:02

asked Nov 23 '18 at 17:42

Krogiar

275

I have a Pandas dataframe of towers, like:

site       lat      lon

18ALOP01   11.1278  14.3578

18ALOP02   11.1278  14.3578

18ALOP12   11.1288  14.3575

18PENO01   11.1580  14.2898

python pandas geo geopandas

edited Nov 23 '18 at 19:02

asked Nov 23 '18 at 17:42

Krogiar

275

edited Nov 23 '18 at 19:02

asked Nov 23 '18 at 17:42

Krogiar

275

edited Nov 23 '18 at 19:02

asked Nov 23 '18 at 17:42

Krogiar

275

asked Nov 23 '18 at 17:42

Krogiar

275

asked Nov 23 '18 at 17:42

Krogiar

275

what does your expected output look like?

– Ken Dekalb
Nov 23 '18 at 18:57

What happens if point a is 40m from point b, 80m from point c, and point b is 45m from point c? Are they all in the same group?

– andersource
Nov 23 '18 at 19:07

Yes, no problem.

– Krogiar
Nov 23 '18 at 19:27

add a comment |

what does your expected output look like?

– Ken Dekalb
Nov 23 '18 at 18:57

What happens if point a is 40m from point b, 80m from point c, and point b is 45m from point c? Are they all in the same group?

– andersource
Nov 23 '18 at 19:07

Yes, no problem.

– Krogiar
Nov 23 '18 at 19:27

what does your expected output look like?

– Ken Dekalb
Nov 23 '18 at 18:57

What happens if point a is 40m from point b, 80m from point c, and point b is 45m from point c? Are they all in the same group?

– andersource
Nov 23 '18 at 19:07

Yes, no problem.

– Krogiar
Nov 23 '18 at 19:27

add a comment |

1 Answer
1

active

oldest

votes

Assuming the number and the "true" location of sites is unknown, you could try the MeanShift clustering algorithm. While that is a general-purpose algorithm and not highly scalable it will be faster than implementing your own clustering algorithm in python, and you could experiment with bin_seeding=True as an optimization, if binning datapoints into a grid is an acceptable short-cut to prune the starting seeds. (Note: if binning datapoints to a grid, rather than computing Euclidian distance between points, is an acceptable "full" solution, that seems like it would be the fastest approach to your problem.)

Here's an example of scikit-learn's implementation of MeanShift, where the x/y coordinates are in meters, and the algorithm creates clusters with radius of 50m.

In [2]: from sklearn.cluster import MeanShift



In [3]: import numpy as np



In [4]: X = np.array([

   ...:     [0, 1], [51, 1], [100, 1], [151, 1],

   ...: ])



In [5]: clustering = MeanShift(bandwidth=50).fit(X)  # OR speed up with bin_seeding=True



In [6]: print(clustering.labels_)

[1 0 0 2]



In [7]: print(clustering.cluster_centers_)

[[ 75.5   1. ]

 [  0.    1. ]

 [151.    1. ]]

edited Jan 16 at 17:37

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451006%2fgroup-gps-points-with-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here's an example of scikit-learn's implementation of MeanShift, where the x/y coordinates are in meters, and the algorithm creates clusters with radius of 50m.

In [2]: from sklearn.cluster import MeanShift



In [3]: import numpy as np



In [4]: X = np.array([

   ...:     [0, 1], [51, 1], [100, 1], [151, 1],

   ...: ])



In [5]: clustering = MeanShift(bandwidth=50).fit(X)  # OR speed up with bin_seeding=True



In [6]: print(clustering.labels_)

[1 0 0 2]



In [7]: print(clustering.cluster_centers_)

[[ 75.5   1. ]

 [  0.    1. ]

 [151.    1. ]]

edited Jan 16 at 17:37

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

add a comment |

Here's an example of scikit-learn's implementation of MeanShift, where the x/y coordinates are in meters, and the algorithm creates clusters with radius of 50m.

In [2]: from sklearn.cluster import MeanShift



In [3]: import numpy as np



In [4]: X = np.array([

   ...:     [0, 1], [51, 1], [100, 1], [151, 1],

   ...: ])



In [5]: clustering = MeanShift(bandwidth=50).fit(X)  # OR speed up with bin_seeding=True



In [6]: print(clustering.labels_)

[1 0 0 2]



In [7]: print(clustering.cluster_centers_)

[[ 75.5   1. ]

 [  0.    1. ]

 [151.    1. ]]

edited Jan 16 at 17:37

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

add a comment |

Here's an example of scikit-learn's implementation of MeanShift, where the x/y coordinates are in meters, and the algorithm creates clusters with radius of 50m.

In [2]: from sklearn.cluster import MeanShift



In [3]: import numpy as np



In [4]: X = np.array([

   ...:     [0, 1], [51, 1], [100, 1], [151, 1],

   ...: ])



In [5]: clustering = MeanShift(bandwidth=50).fit(X)  # OR speed up with bin_seeding=True



In [6]: print(clustering.labels_)

[1 0 0 2]



In [7]: print(clustering.cluster_centers_)

[[ 75.5   1. ]

 [  0.    1. ]

 [151.    1. ]]

edited Jan 16 at 17:37

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

Here's an example of scikit-learn's implementation of MeanShift, where the x/y coordinates are in meters, and the algorithm creates clusters with radius of 50m.

In [2]: from sklearn.cluster import MeanShift



In [3]: import numpy as np



In [4]: X = np.array([

   ...:     [0, 1], [51, 1], [100, 1], [151, 1],

   ...: ])



In [5]: clustering = MeanShift(bandwidth=50).fit(X)  # OR speed up with bin_seeding=True



In [6]: print(clustering.labels_)

[1 0 0 2]



In [7]: print(clustering.cluster_centers_)

[[ 75.5   1. ]

 [  0.    1. ]

 [151.    1. ]]

edited Jan 16 at 17:37

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

edited Jan 16 at 17:37

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

answered Nov 24 '18 at 19:47

Garrett

21.9k34544

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg