How to divide the dataframe into bins of specific length with unequal number of points?












0















I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach



df = pc13.sort_values(by = ['A'],  ascending=True)
df_temp = np.array_split(df, 20)


But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.



The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.










share|improve this question

























  • Please share your dataframe too. Or at least the range of values of the column you want to bin.

    – Mohit Motwani
    Nov 23 '18 at 5:20











  • @MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

    – Upriser
    Nov 23 '18 at 5:24











  • @Upriser what is the criteria for splitting into bins?

    – Chris
    Nov 23 '18 at 5:28











  • @Chris The bin width should be equal but the number of points in the bins can be different.

    – Upriser
    Nov 23 '18 at 5:32











  • @Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

    – Chris
    Nov 23 '18 at 5:41
















0















I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach



df = pc13.sort_values(by = ['A'],  ascending=True)
df_temp = np.array_split(df, 20)


But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.



The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.










share|improve this question

























  • Please share your dataframe too. Or at least the range of values of the column you want to bin.

    – Mohit Motwani
    Nov 23 '18 at 5:20











  • @MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

    – Upriser
    Nov 23 '18 at 5:24











  • @Upriser what is the criteria for splitting into bins?

    – Chris
    Nov 23 '18 at 5:28











  • @Chris The bin width should be equal but the number of points in the bins can be different.

    – Upriser
    Nov 23 '18 at 5:32











  • @Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

    – Chris
    Nov 23 '18 at 5:41














0












0








0








I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach



df = pc13.sort_values(by = ['A'],  ascending=True)
df_temp = np.array_split(df, 20)


But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.



The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.










share|improve this question
















I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach



df = pc13.sort_values(by = ['A'],  ascending=True)
df_temp = np.array_split(df, 20)


But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.



The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.







python-3.x pandas binning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 5:23







Upriser

















asked Nov 23 '18 at 5:19









UpriserUpriser

689




689













  • Please share your dataframe too. Or at least the range of values of the column you want to bin.

    – Mohit Motwani
    Nov 23 '18 at 5:20











  • @MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

    – Upriser
    Nov 23 '18 at 5:24











  • @Upriser what is the criteria for splitting into bins?

    – Chris
    Nov 23 '18 at 5:28











  • @Chris The bin width should be equal but the number of points in the bins can be different.

    – Upriser
    Nov 23 '18 at 5:32











  • @Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

    – Chris
    Nov 23 '18 at 5:41



















  • Please share your dataframe too. Or at least the range of values of the column you want to bin.

    – Mohit Motwani
    Nov 23 '18 at 5:20











  • @MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

    – Upriser
    Nov 23 '18 at 5:24











  • @Upriser what is the criteria for splitting into bins?

    – Chris
    Nov 23 '18 at 5:28











  • @Chris The bin width should be equal but the number of points in the bins can be different.

    – Upriser
    Nov 23 '18 at 5:32











  • @Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

    – Chris
    Nov 23 '18 at 5:41

















Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20





Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20













@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24





@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24













@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28





@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28













@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32





@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32













@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41





@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41












1 Answer
1






active

oldest

votes


















1














you can do something like:



# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))

# sort valeus
df = df.sort_values(by = ['A'], ascending=True)

# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins

# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]


update



# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))

# sort on A
df.sort_values('A', inplace=True)

# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)

# group on bin
group = df.groupby('bin')

# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]


[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .





share|improve this answer


























  • This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

    – Upriser
    Nov 23 '18 at 7:03











  • @Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

    – Chris
    Nov 23 '18 at 7:19











  • @Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

    – Chris
    Nov 23 '18 at 19:27











  • It worked. Thanks a lot for helping.

    – Upriser
    Nov 23 '18 at 20:43













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440988%2fhow-to-divide-the-dataframe-into-bins-of-specific-length-with-unequal-number-of%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














you can do something like:



# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))

# sort valeus
df = df.sort_values(by = ['A'], ascending=True)

# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins

# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]


update



# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))

# sort on A
df.sort_values('A', inplace=True)

# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)

# group on bin
group = df.groupby('bin')

# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]


[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .





share|improve this answer


























  • This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

    – Upriser
    Nov 23 '18 at 7:03











  • @Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

    – Chris
    Nov 23 '18 at 7:19











  • @Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

    – Chris
    Nov 23 '18 at 19:27











  • It worked. Thanks a lot for helping.

    – Upriser
    Nov 23 '18 at 20:43


















1














you can do something like:



# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))

# sort valeus
df = df.sort_values(by = ['A'], ascending=True)

# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins

# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]


update



# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))

# sort on A
df.sort_values('A', inplace=True)

# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)

# group on bin
group = df.groupby('bin')

# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]


[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .





share|improve this answer


























  • This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

    – Upriser
    Nov 23 '18 at 7:03











  • @Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

    – Chris
    Nov 23 '18 at 7:19











  • @Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

    – Chris
    Nov 23 '18 at 19:27











  • It worked. Thanks a lot for helping.

    – Upriser
    Nov 23 '18 at 20:43
















1












1








1







you can do something like:



# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))

# sort valeus
df = df.sort_values(by = ['A'], ascending=True)

# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins

# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]


update



# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))

# sort on A
df.sort_values('A', inplace=True)

# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)

# group on bin
group = df.groupby('bin')

# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]


[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .





share|improve this answer















you can do something like:



# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))

# sort valeus
df = df.sort_values(by = ['A'], ascending=True)

# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins

# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]


update



# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))

# sort on A
df.sort_values('A', inplace=True)

# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)

# group on bin
group = df.groupby('bin')

# list comprehension to split groups into list of dataframes
dfs = [group.get_group(x) for x in group.groups]


[ A B C D E bin
218 -2.716093 0.833726 -0.771400 0.691251 0.162448 (-2.723, -2.413]
207 -2.581388 -2.318333 -0.001467 0.035277 1.219666 (-2.723, -2.413]
380 -2.499710 1.946709 -0.519070 1.653383 0.309689 (-2.723, -2.413]
866 -2.492050 0.246500 -0.596392 0.872888 2.371652 (-2.723, -2.413]
876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665 (-2.723, -2.413]
314 -2.456308 0.630691 -0.072146 1.139697 0.663674 (-2.723, -2.413]
310 -2.455353 0.075842 0.589515 -0.427233 1.207979 (-2.723, -2.413]
660 -2.427255 0.890125 -0.042716 -1.038401 0.651324 (-2.723, -2.413],
A B C D E bin
571 -2.355430 0.383794 -1.266575 -1.214833 -0.862611 (-2.413, -2.11]
977 -2.354416 -1.964189 0.440376 0.028032 -0.181360 (-2.413, -2.11]
83 -2.276908 0.288462 0.370555 -0.546359 -2.033892 (-2.413, -2.11]
196 -2.213729 -1.087783 -0.592884 1.233886 1.051164 (-2.413, -2.11]
227 -2.146631 0.365183 -0.095293 -0.882414 0.385117 (-2.413, -2.11]
39 -2.136800 -1.150065 0.180182 -0.424071 0.040370 (-2.413, -2.11],
A B C D E bin
104 -2.108961 -0.396602 -1.014224 -1.277124 0.001030 (-2.11, -1.806]
360 -2.098928 1.093483 1.438421 -0.980215 0.010359 (-2.11, -1.806]
530 -2.088592 1.043201 -0.522468 0.482176 -0.680166 (-2.11, -1.806]
158 -2.062759 2.070387 2.124621 -2.751532 0.674055 (-2.11, -1.806]
971 -2.053039 0.347577 -0.498513 1.917305 -1.746493 (-2.11, -1.806]
658 -2.002482 -1.222292 -0.398816 0.279228 -1.485782 (-2.11, -1.806]
90 -1.985261 3.499251 -2.089028 1.238524 -1.781089 (-2.11, -1.806]
466 -1.973640 -1.609920 -1.029454 0.809143 -0.228893 (-2.11, -1.806]
40 -1.966016 -1.479240 -1.564966 -0.310133 1.338023 (-2.11, -1.806]
279 -1.943666 0.762493 0.060038 0.449159 0.244411 (-2.11, -1.806]
204 -1.940045 0.844901 -0.343691 -1.144836 1.385915 (-2.11, -1.806]
780 -1.918548 0.212452 0.225789 0.216110 1.710532 (-2.11, -1.806]
289 -1.897438 0.847664 0.689778 -0.454152 -0.747836 (-2.11, -1.806]
159 -1.848425 0.477726 0.391384 -0.477804 0.168160 (-2.11, -1.806],
. . .






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 '18 at 19:34

























answered Nov 23 '18 at 6:30









ChrisChris

2,6082420




2,6082420













  • This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

    – Upriser
    Nov 23 '18 at 7:03











  • @Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

    – Chris
    Nov 23 '18 at 7:19











  • @Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

    – Chris
    Nov 23 '18 at 19:27











  • It worked. Thanks a lot for helping.

    – Upriser
    Nov 23 '18 at 20:43





















  • This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

    – Upriser
    Nov 23 '18 at 7:03











  • @Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

    – Chris
    Nov 23 '18 at 7:19











  • @Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

    – Chris
    Nov 23 '18 at 19:27











  • It worked. Thanks a lot for helping.

    – Upriser
    Nov 23 '18 at 20:43



















This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03





This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03













@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19





@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19













@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27





@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27













It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43







It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440988%2fhow-to-divide-the-dataframe-into-bins-of-specific-length-with-unequal-number-of%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen