How to divide the dataframe into bins of specific length with unequal number of points?
I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach
df = pc13.sort_values(by = ['A'], ascending=True)
df_temp = np.array_split(df, 20)
But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.
The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.
python-3.x pandas binning
|
show 1 more comment
I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach
df = pc13.sort_values(by = ['A'], ascending=True)
df_temp = np.array_split(df, 20)
But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.
The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.
python-3.x pandas binning
Please share your dataframe too. Or at least the range of values of the column you want to bin.
– Mohit Motwani
Nov 23 '18 at 5:20
@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.
– Upriser
Nov 23 '18 at 5:24
@Upriser what is the criteria for splitting into bins?
– Chris
Nov 23 '18 at 5:28
@Chris The bin width should be equal but the number of points in the bins can be different.
– Upriser
Nov 23 '18 at 5:32
@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe
– Chris
Nov 23 '18 at 5:41
|
show 1 more comment
I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach
df = pc13.sort_values(by = ['A'], ascending=True)
df_temp = np.array_split(df, 20)
But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.
The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.
python-3.x pandas binning
I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach
df = pc13.sort_values(by = ['A'], ascending=True)
df_temp = np.array_split(df, 20)
But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.
The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.
python-3.x pandas binning
python-3.x pandas binning
edited Nov 23 '18 at 5:23
Upriser
asked Nov 23 '18 at 5:19
UpriserUpriser
689
689
Please share your dataframe too. Or at least the range of values of the column you want to bin.
– Mohit Motwani
Nov 23 '18 at 5:20
@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.
– Upriser
Nov 23 '18 at 5:24
@Upriser what is the criteria for splitting into bins?
– Chris
Nov 23 '18 at 5:28
@Chris The bin width should be equal but the number of points in the bins can be different.
– Upriser
Nov 23 '18 at 5:32
@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe
– Chris
Nov 23 '18 at 5:41
|
show 1 more comment
Please share your dataframe too. Or at least the range of values of the column you want to bin.
– Mohit Motwani
Nov 23 '18 at 5:20
@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.
– Upriser
Nov 23 '18 at 5:24
@Upriser what is the criteria for splitting into bins?
– Chris
Nov 23 '18 at 5:28
@Chris The bin width should be equal but the number of points in the bins can be different.
– Upriser
Nov 23 '18 at 5:32
@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe
– Chris
Nov 23 '18 at 5:41
Please share your dataframe too. Or at least the range of values of the column you want to bin.
– Mohit Motwani
Nov 23 '18 at 5:20
Please share your dataframe too. Or at least the range of values of the column you want to bin.
– Mohit Motwani
Nov 23 '18 at 5:20
@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.
– Upriser
Nov 23 '18 at 5:24
@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.
– Upriser
Nov 23 '18 at 5:24
@Upriser what is the criteria for splitting into bins?
– Chris
Nov 23 '18 at 5:28
@Upriser what is the criteria for splitting into bins?
– Chris
Nov 23 '18 at 5:28
@Chris The bin width should be equal but the number of points in the bins can be different.
– Upriser
Nov 23 '18 at 5:32
@Chris The bin width should be equal but the number of points in the bins can be different.
– Upriser
Nov 23 '18 at 5:32
@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe
– Chris
Nov 23 '18 at 5:41
@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe
– Chris
Nov 23 '18 at 5:41
|
show 1 more comment
1 Answer
1
active
oldest
votes
you can do something like:
# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
# sort valeus
df = df.sort_values(by = ['A'], ascending=True)
# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins
# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]
update
# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))
# sort on A
df.sort_values('A', inplace=True)
# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)
# group on bin
group = df.groupby('bin')
# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]
[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
@Upriser do something likenp.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns
– Chris
Nov 23 '18 at 7:19
@Upriser I think I understand what you are looking for now. I was confused by your attempt usingarray_split
. See update
– Chris
Nov 23 '18 at 19:27
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440988%2fhow-to-divide-the-dataframe-into-bins-of-specific-length-with-unequal-number-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
you can do something like:
# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
# sort valeus
df = df.sort_values(by = ['A'], ascending=True)
# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins
# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]
update
# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))
# sort on A
df.sort_values('A', inplace=True)
# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)
# group on bin
group = df.groupby('bin')
# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]
[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
@Upriser do something likenp.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns
– Chris
Nov 23 '18 at 7:19
@Upriser I think I understand what you are looking for now. I was confused by your attempt usingarray_split
. See update
– Chris
Nov 23 '18 at 19:27
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
add a comment |
you can do something like:
# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
# sort valeus
df = df.sort_values(by = ['A'], ascending=True)
# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins
# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]
update
# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))
# sort on A
df.sort_values('A', inplace=True)
# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)
# group on bin
group = df.groupby('bin')
# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]
[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
@Upriser do something likenp.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns
– Chris
Nov 23 '18 at 7:19
@Upriser I think I understand what you are looking for now. I was confused by your attempt usingarray_split
. See update
– Chris
Nov 23 '18 at 19:27
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
add a comment |
you can do something like:
# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
# sort valeus
df = df.sort_values(by = ['A'], ascending=True)
# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins
# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]
update
# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))
# sort on A
df.sort_values('A', inplace=True)
# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)
# group on bin
group = df.groupby('bin')
# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]
[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .
you can do something like:
# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
# sort valeus
df = df.sort_values(by = ['A'], ascending=True)
# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins
# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]
update
# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))
# sort on A
df.sort_values('A', inplace=True)
# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)
# group on bin
group = df.groupby('bin')
# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]
[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .
edited Nov 23 '18 at 19:34
answered Nov 23 '18 at 6:30
ChrisChris
2,6082420
2,6082420
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
@Upriser do something likenp.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns
– Chris
Nov 23 '18 at 7:19
@Upriser I think I understand what you are looking for now. I was confused by your attempt usingarray_split
. See update
– Chris
Nov 23 '18 at 19:27
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
add a comment |
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
@Upriser do something likenp.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns
– Chris
Nov 23 '18 at 7:19
@Upriser I think I understand what you are looking for now. I was confused by your attempt usingarray_split
. See update
– Chris
Nov 23 '18 at 19:27
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.
– Upriser
Nov 23 '18 at 7:03
@Upriser do something like
np.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns– Chris
Nov 23 '18 at 7:19
@Upriser do something like
np.split(df[df.colums[:5]], [1,3,6])
and append the remaining columns– Chris
Nov 23 '18 at 7:19
@Upriser I think I understand what you are looking for now. I was confused by your attempt using
array_split
. See update– Chris
Nov 23 '18 at 19:27
@Upriser I think I understand what you are looking for now. I was confused by your attempt using
array_split
. See update– Chris
Nov 23 '18 at 19:27
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
It worked. Thanks a lot for helping.
– Upriser
Nov 23 '18 at 20:43
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440988%2fhow-to-divide-the-dataframe-into-bins-of-specific-length-with-unequal-number-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please share your dataframe too. Or at least the range of values of the column you want to bin.
– Mohit Motwani
Nov 23 '18 at 5:20
@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.
– Upriser
Nov 23 '18 at 5:24
@Upriser what is the criteria for splitting into bins?
– Chris
Nov 23 '18 at 5:28
@Chris The bin width should be equal but the number of points in the bins can be different.
– Upriser
Nov 23 '18 at 5:32
@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe
– Chris
Nov 23 '18 at 5:41