How to divide the dataframe into bins of specific length with unequal number of points?

I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach

df = pc13.sort_values(by = ['A'],  ascending=True)

df_temp = np.array_split(df, 20)

But this approach is dividing the dataframe into bins with equal number of data points. Instead of that I want to divide the dataframe into bins of particular width, also number of data points in each bin may not be same.

The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.

edited Nov 23 '18 at 5:23

asked Nov 23 '18 at 5:19

Upriser

689

Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20

@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24

@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28

@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32

@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41

|
show 1 more comment

I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach

df = pc13.sort_values(by = ['A'],  ascending=True)

df_temp = np.array_split(df, 20)

The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.

edited Nov 23 '18 at 5:23

asked Nov 23 '18 at 5:19

Upriser

689

Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20

@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24

@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28

@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32

@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41

|
show 1 more comment

I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach

df = pc13.sort_values(by = ['A'],  ascending=True)

df_temp = np.array_split(df, 20)

The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.

edited Nov 23 '18 at 5:23

asked Nov 23 '18 at 5:19

Upriser

689

I have a dataframe and I want to divide that dataframe into bins of equal width (Number of data points in each bins may not be same). I have tried the following approach

df = pc13.sort_values(by = ['A'],  ascending=True)

df_temp = np.array_split(df, 20)

The minimum value in the dataframe column A is -0.04843731030699292 and maximum value is 0.05417013917000033. I tried uploading the entire dataframe but it is very big file.

python-3.x pandas binning

edited Nov 23 '18 at 5:23

asked Nov 23 '18 at 5:19

Upriser

689

edited Nov 23 '18 at 5:23

asked Nov 23 '18 at 5:19

Upriser

689

edited Nov 23 '18 at 5:23

asked Nov 23 '18 at 5:19

Upriser

689

asked Nov 23 '18 at 5:19

Upriser

689

asked Nov 23 '18 at 5:19

Upriser

689

Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20

@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24

@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28

@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32

@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41

|
show 1 more comment

Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20

@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24

@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28

@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32

@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41

Please share your dataframe too. Or at least the range of values of the column you want to bin.

– Mohit Motwani
Nov 23 '18 at 5:20

@MohitMotwani I added the min and max value for the dataframe for which I'm trying to apply binning.

– Upriser
Nov 23 '18 at 5:24

@Upriser what is the criteria for splitting into bins?

– Chris
Nov 23 '18 at 5:28

@Chris The bin width should be equal but the number of points in the bins can be different.

– Upriser
Nov 23 '18 at 5:32

@Upriser your code above is just splitting your dataframe based on the index it has nothing to do with columns. The width is always going to be the same as the original dataframe

– Chris
Nov 23 '18 at 5:41

|
show 1 more comment

1 Answer
1

active

oldest

votes

you can do something like:

# create a random df

df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))



# sort valeus

df = df.sort_values(by = ['A'],  ascending=True)



# use your code but on a transposed dataframe

new = np.array_split(df.T, 5) # split columns into 5 bins



# list comprehension to transposed dataframes

dfs = [new[i].T for i in range(len(new))]

update

# random df

df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))



# sort on A

df.sort_values('A', inplace=True)



# create bins

df['bin'] = pd.cut(df['A'], 20, include_lowest = True)



# group on bin

group = df.groupby('bin')



# list comprehension to split groups into list of dataframes 

dfs = [group.get_group(x) for x in group.groups]





[            A         B         C         D         E               bin

 218 -2.716093  0.833726 -0.771400  0.691251  0.162448  (-2.723, -2.413]

 207 -2.581388 -2.318333 -0.001467  0.035277  1.219666  (-2.723, -2.413]

 380 -2.499710  1.946709 -0.519070  1.653383  0.309689  (-2.723, -2.413]

 866 -2.492050  0.246500 -0.596392  0.872888  2.371652  (-2.723, -2.413]

 876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665  (-2.723, -2.413]

 314 -2.456308  0.630691 -0.072146  1.139697  0.663674  (-2.723, -2.413]

 310 -2.455353  0.075842  0.589515 -0.427233  1.207979  (-2.723, -2.413]

 660 -2.427255  0.890125 -0.042716 -1.038401  0.651324  (-2.723, -2.413],

             A         B         C         D         E              bin

 571 -2.355430  0.383794 -1.266575 -1.214833 -0.862611  (-2.413, -2.11]

 977 -2.354416 -1.964189  0.440376  0.028032 -0.181360  (-2.413, -2.11]

 83  -2.276908  0.288462  0.370555 -0.546359 -2.033892  (-2.413, -2.11]

 196 -2.213729 -1.087783 -0.592884  1.233886  1.051164  (-2.413, -2.11]

 227 -2.146631  0.365183 -0.095293 -0.882414  0.385117  (-2.413, -2.11]

 39  -2.136800 -1.150065  0.180182 -0.424071  0.040370  (-2.413, -2.11],

             A         B         C         D         E              bin

 104 -2.108961 -0.396602 -1.014224 -1.277124  0.001030  (-2.11, -1.806]

 360 -2.098928  1.093483  1.438421 -0.980215  0.010359  (-2.11, -1.806]

 530 -2.088592  1.043201 -0.522468  0.482176 -0.680166  (-2.11, -1.806]

 158 -2.062759  2.070387  2.124621 -2.751532  0.674055  (-2.11, -1.806]

 971 -2.053039  0.347577 -0.498513  1.917305 -1.746493  (-2.11, -1.806]

 658 -2.002482 -1.222292 -0.398816  0.279228 -1.485782  (-2.11, -1.806]

 90  -1.985261  3.499251 -2.089028  1.238524 -1.781089  (-2.11, -1.806]

 466 -1.973640 -1.609920 -1.029454  0.809143 -0.228893  (-2.11, -1.806]

 40  -1.966016 -1.479240 -1.564966 -0.310133  1.338023  (-2.11, -1.806]

 279 -1.943666  0.762493  0.060038  0.449159  0.244411  (-2.11, -1.806]

 204 -1.940045  0.844901 -0.343691 -1.144836  1.385915  (-2.11, -1.806]

 780 -1.918548  0.212452  0.225789  0.216110  1.710532  (-2.11, -1.806]

 289 -1.897438  0.847664  0.689778 -0.454152 -0.747836  (-2.11, -1.806]

 159 -1.848425  0.477726  0.391384 -0.477804  0.168160  (-2.11, -1.806],

. . .

edited Nov 23 '18 at 19:34

answered Nov 23 '18 at 6:30

Chris

2,6082420

This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03

@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19

@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27

It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440988%2fhow-to-divide-the-dataframe-into-bins-of-specific-length-with-unequal-number-of%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

you can do something like:

# create a random df

df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))



# sort valeus

df = df.sort_values(by = ['A'],  ascending=True)



# use your code but on a transposed dataframe

new = np.array_split(df.T, 5) # split columns into 5 bins



# list comprehension to transposed dataframes

dfs = [new[i].T for i in range(len(new))]

update

# random df

df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))



# sort on A

df.sort_values('A', inplace=True)



# create bins

df['bin'] = pd.cut(df['A'], 20, include_lowest = True)



# group on bin

group = df.groupby('bin')



# list comprehension to split groups into list of dataframes 

dfs = [group.get_group(x) for x in group.groups]





[            A         B         C         D         E               bin

 218 -2.716093  0.833726 -0.771400  0.691251  0.162448  (-2.723, -2.413]

 207 -2.581388 -2.318333 -0.001467  0.035277  1.219666  (-2.723, -2.413]

 380 -2.499710  1.946709 -0.519070  1.653383  0.309689  (-2.723, -2.413]

 866 -2.492050  0.246500 -0.596392  0.872888  2.371652  (-2.723, -2.413]

 876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665  (-2.723, -2.413]

 314 -2.456308  0.630691 -0.072146  1.139697  0.663674  (-2.723, -2.413]

 310 -2.455353  0.075842  0.589515 -0.427233  1.207979  (-2.723, -2.413]

 660 -2.427255  0.890125 -0.042716 -1.038401  0.651324  (-2.723, -2.413],

             A         B         C         D         E              bin

 571 -2.355430  0.383794 -1.266575 -1.214833 -0.862611  (-2.413, -2.11]

 977 -2.354416 -1.964189  0.440376  0.028032 -0.181360  (-2.413, -2.11]

 83  -2.276908  0.288462  0.370555 -0.546359 -2.033892  (-2.413, -2.11]

 196 -2.213729 -1.087783 -0.592884  1.233886  1.051164  (-2.413, -2.11]

 227 -2.146631  0.365183 -0.095293 -0.882414  0.385117  (-2.413, -2.11]

 39  -2.136800 -1.150065  0.180182 -0.424071  0.040370  (-2.413, -2.11],

             A         B         C         D         E              bin

 104 -2.108961 -0.396602 -1.014224 -1.277124  0.001030  (-2.11, -1.806]

 360 -2.098928  1.093483  1.438421 -0.980215  0.010359  (-2.11, -1.806]

 530 -2.088592  1.043201 -0.522468  0.482176 -0.680166  (-2.11, -1.806]

 158 -2.062759  2.070387  2.124621 -2.751532  0.674055  (-2.11, -1.806]

 971 -2.053039  0.347577 -0.498513  1.917305 -1.746493  (-2.11, -1.806]

 658 -2.002482 -1.222292 -0.398816  0.279228 -1.485782  (-2.11, -1.806]

 90  -1.985261  3.499251 -2.089028  1.238524 -1.781089  (-2.11, -1.806]

 466 -1.973640 -1.609920 -1.029454  0.809143 -0.228893  (-2.11, -1.806]

 40  -1.966016 -1.479240 -1.564966 -0.310133  1.338023  (-2.11, -1.806]

 279 -1.943666  0.762493  0.060038  0.449159  0.244411  (-2.11, -1.806]

 204 -1.940045  0.844901 -0.343691 -1.144836  1.385915  (-2.11, -1.806]

 780 -1.918548  0.212452  0.225789  0.216110  1.710532  (-2.11, -1.806]

 289 -1.897438  0.847664  0.689778 -0.454152 -0.747836  (-2.11, -1.806]

 159 -1.848425  0.477726  0.391384 -0.477804  0.168160  (-2.11, -1.806],

. . .

edited Nov 23 '18 at 19:34

answered Nov 23 '18 at 6:30

Chris

2,6082420

This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03

@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19

@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27

It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43

add a comment |

you can do something like:

# create a random df

df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))



# sort valeus

df = df.sort_values(by = ['A'],  ascending=True)



# use your code but on a transposed dataframe

new = np.array_split(df.T, 5) # split columns into 5 bins



# list comprehension to transposed dataframes

dfs = [new[i].T for i in range(len(new))]

update

# random df

df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))



# sort on A

df.sort_values('A', inplace=True)



# create bins

df['bin'] = pd.cut(df['A'], 20, include_lowest = True)



# group on bin

group = df.groupby('bin')



# list comprehension to split groups into list of dataframes 

dfs = [group.get_group(x) for x in group.groups]





[            A         B         C         D         E               bin

 218 -2.716093  0.833726 -0.771400  0.691251  0.162448  (-2.723, -2.413]

 207 -2.581388 -2.318333 -0.001467  0.035277  1.219666  (-2.723, -2.413]

 380 -2.499710  1.946709 -0.519070  1.653383  0.309689  (-2.723, -2.413]

 866 -2.492050  0.246500 -0.596392  0.872888  2.371652  (-2.723, -2.413]

 876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665  (-2.723, -2.413]

 314 -2.456308  0.630691 -0.072146  1.139697  0.663674  (-2.723, -2.413]

 310 -2.455353  0.075842  0.589515 -0.427233  1.207979  (-2.723, -2.413]

 660 -2.427255  0.890125 -0.042716 -1.038401  0.651324  (-2.723, -2.413],

             A         B         C         D         E              bin

 571 -2.355430  0.383794 -1.266575 -1.214833 -0.862611  (-2.413, -2.11]

 977 -2.354416 -1.964189  0.440376  0.028032 -0.181360  (-2.413, -2.11]

 83  -2.276908  0.288462  0.370555 -0.546359 -2.033892  (-2.413, -2.11]

 196 -2.213729 -1.087783 -0.592884  1.233886  1.051164  (-2.413, -2.11]

 227 -2.146631  0.365183 -0.095293 -0.882414  0.385117  (-2.413, -2.11]

 39  -2.136800 -1.150065  0.180182 -0.424071  0.040370  (-2.413, -2.11],

             A         B         C         D         E              bin

 104 -2.108961 -0.396602 -1.014224 -1.277124  0.001030  (-2.11, -1.806]

 360 -2.098928  1.093483  1.438421 -0.980215  0.010359  (-2.11, -1.806]

 530 -2.088592  1.043201 -0.522468  0.482176 -0.680166  (-2.11, -1.806]

 158 -2.062759  2.070387  2.124621 -2.751532  0.674055  (-2.11, -1.806]

 971 -2.053039  0.347577 -0.498513  1.917305 -1.746493  (-2.11, -1.806]

 658 -2.002482 -1.222292 -0.398816  0.279228 -1.485782  (-2.11, -1.806]

 90  -1.985261  3.499251 -2.089028  1.238524 -1.781089  (-2.11, -1.806]

 466 -1.973640 -1.609920 -1.029454  0.809143 -0.228893  (-2.11, -1.806]

 40  -1.966016 -1.479240 -1.564966 -0.310133  1.338023  (-2.11, -1.806]

 279 -1.943666  0.762493  0.060038  0.449159  0.244411  (-2.11, -1.806]

 204 -1.940045  0.844901 -0.343691 -1.144836  1.385915  (-2.11, -1.806]

 780 -1.918548  0.212452  0.225789  0.216110  1.710532  (-2.11, -1.806]

 289 -1.897438  0.847664  0.689778 -0.454152 -0.747836  (-2.11, -1.806]

 159 -1.848425  0.477726  0.391384 -0.477804  0.168160  (-2.11, -1.806],

. . .

edited Nov 23 '18 at 19:34

answered Nov 23 '18 at 6:30

Chris

2,6082420

This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03

@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19

@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27

It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43

add a comment |

you can do something like:

# create a random df

df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))



# sort valeus

df = df.sort_values(by = ['A'],  ascending=True)



# use your code but on a transposed dataframe

new = np.array_split(df.T, 5) # split columns into 5 bins



# list comprehension to transposed dataframes

dfs = [new[i].T for i in range(len(new))]

update

# random df

df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))



# sort on A

df.sort_values('A', inplace=True)



# create bins

df['bin'] = pd.cut(df['A'], 20, include_lowest = True)



# group on bin

group = df.groupby('bin')



# list comprehension to split groups into list of dataframes 

dfs = [group.get_group(x) for x in group.groups]





[            A         B         C         D         E               bin

 218 -2.716093  0.833726 -0.771400  0.691251  0.162448  (-2.723, -2.413]

 207 -2.581388 -2.318333 -0.001467  0.035277  1.219666  (-2.723, -2.413]

 380 -2.499710  1.946709 -0.519070  1.653383  0.309689  (-2.723, -2.413]

 866 -2.492050  0.246500 -0.596392  0.872888  2.371652  (-2.723, -2.413]

 876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665  (-2.723, -2.413]

 314 -2.456308  0.630691 -0.072146  1.139697  0.663674  (-2.723, -2.413]

 310 -2.455353  0.075842  0.589515 -0.427233  1.207979  (-2.723, -2.413]

 660 -2.427255  0.890125 -0.042716 -1.038401  0.651324  (-2.723, -2.413],

             A         B         C         D         E              bin

 571 -2.355430  0.383794 -1.266575 -1.214833 -0.862611  (-2.413, -2.11]

 977 -2.354416 -1.964189  0.440376  0.028032 -0.181360  (-2.413, -2.11]

 83  -2.276908  0.288462  0.370555 -0.546359 -2.033892  (-2.413, -2.11]

 196 -2.213729 -1.087783 -0.592884  1.233886  1.051164  (-2.413, -2.11]

 227 -2.146631  0.365183 -0.095293 -0.882414  0.385117  (-2.413, -2.11]

 39  -2.136800 -1.150065  0.180182 -0.424071  0.040370  (-2.413, -2.11],

             A         B         C         D         E              bin

 104 -2.108961 -0.396602 -1.014224 -1.277124  0.001030  (-2.11, -1.806]

 360 -2.098928  1.093483  1.438421 -0.980215  0.010359  (-2.11, -1.806]

 530 -2.088592  1.043201 -0.522468  0.482176 -0.680166  (-2.11, -1.806]

 158 -2.062759  2.070387  2.124621 -2.751532  0.674055  (-2.11, -1.806]

 971 -2.053039  0.347577 -0.498513  1.917305 -1.746493  (-2.11, -1.806]

 658 -2.002482 -1.222292 -0.398816  0.279228 -1.485782  (-2.11, -1.806]

 90  -1.985261  3.499251 -2.089028  1.238524 -1.781089  (-2.11, -1.806]

 466 -1.973640 -1.609920 -1.029454  0.809143 -0.228893  (-2.11, -1.806]

 40  -1.966016 -1.479240 -1.564966 -0.310133  1.338023  (-2.11, -1.806]

 279 -1.943666  0.762493  0.060038  0.449159  0.244411  (-2.11, -1.806]

 204 -1.940045  0.844901 -0.343691 -1.144836  1.385915  (-2.11, -1.806]

 780 -1.918548  0.212452  0.225789  0.216110  1.710532  (-2.11, -1.806]

 289 -1.897438  0.847664  0.689778 -0.454152 -0.747836  (-2.11, -1.806]

 159 -1.848425  0.477726  0.391384 -0.477804  0.168160  (-2.11, -1.806],

. . .

edited Nov 23 '18 at 19:34

answered Nov 23 '18 at 6:30

Chris

2,6082420

you can do something like:

# create a random df

df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))



# sort valeus

df = df.sort_values(by = ['A'],  ascending=True)



# use your code but on a transposed dataframe

new = np.array_split(df.T, 5) # split columns into 5 bins



# list comprehension to transposed dataframes

dfs = [new[i].T for i in range(len(new))]

update

# random df

df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))



# sort on A

df.sort_values('A', inplace=True)



# create bins

df['bin'] = pd.cut(df['A'], 20, include_lowest = True)



# group on bin

group = df.groupby('bin')



# list comprehension to split groups into list of dataframes 

dfs = [group.get_group(x) for x in group.groups]





[            A         B         C         D         E               bin

 218 -2.716093  0.833726 -0.771400  0.691251  0.162448  (-2.723, -2.413]

 207 -2.581388 -2.318333 -0.001467  0.035277  1.219666  (-2.723, -2.413]

 380 -2.499710  1.946709 -0.519070  1.653383  0.309689  (-2.723, -2.413]

 866 -2.492050  0.246500 -0.596392  0.872888  2.371652  (-2.723, -2.413]

 876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665  (-2.723, -2.413]

 314 -2.456308  0.630691 -0.072146  1.139697  0.663674  (-2.723, -2.413]

 310 -2.455353  0.075842  0.589515 -0.427233  1.207979  (-2.723, -2.413]

 660 -2.427255  0.890125 -0.042716 -1.038401  0.651324  (-2.723, -2.413],

             A         B         C         D         E              bin

 571 -2.355430  0.383794 -1.266575 -1.214833 -0.862611  (-2.413, -2.11]

 977 -2.354416 -1.964189  0.440376  0.028032 -0.181360  (-2.413, -2.11]

 83  -2.276908  0.288462  0.370555 -0.546359 -2.033892  (-2.413, -2.11]

 196 -2.213729 -1.087783 -0.592884  1.233886  1.051164  (-2.413, -2.11]

 227 -2.146631  0.365183 -0.095293 -0.882414  0.385117  (-2.413, -2.11]

 39  -2.136800 -1.150065  0.180182 -0.424071  0.040370  (-2.413, -2.11],

             A         B         C         D         E              bin

 104 -2.108961 -0.396602 -1.014224 -1.277124  0.001030  (-2.11, -1.806]

 360 -2.098928  1.093483  1.438421 -0.980215  0.010359  (-2.11, -1.806]

 530 -2.088592  1.043201 -0.522468  0.482176 -0.680166  (-2.11, -1.806]

 158 -2.062759  2.070387  2.124621 -2.751532  0.674055  (-2.11, -1.806]

 971 -2.053039  0.347577 -0.498513  1.917305 -1.746493  (-2.11, -1.806]

 658 -2.002482 -1.222292 -0.398816  0.279228 -1.485782  (-2.11, -1.806]

 90  -1.985261  3.499251 -2.089028  1.238524 -1.781089  (-2.11, -1.806]

 466 -1.973640 -1.609920 -1.029454  0.809143 -0.228893  (-2.11, -1.806]

 40  -1.966016 -1.479240 -1.564966 -0.310133  1.338023  (-2.11, -1.806]

 279 -1.943666  0.762493  0.060038  0.449159  0.244411  (-2.11, -1.806]

 204 -1.940045  0.844901 -0.343691 -1.144836  1.385915  (-2.11, -1.806]

 780 -1.918548  0.212452  0.225789  0.216110  1.710532  (-2.11, -1.806]

 289 -1.897438  0.847664  0.689778 -0.454152 -0.747836  (-2.11, -1.806]

 159 -1.848425  0.477726  0.391384 -0.477804  0.168160  (-2.11, -1.806],

. . .

edited Nov 23 '18 at 19:34

answered Nov 23 '18 at 6:30

Chris

2,6082420

edited Nov 23 '18 at 19:34

answered Nov 23 '18 at 6:30

Chris

2,6082420

answered Nov 23 '18 at 6:30

Chris

2,6082420

answered Nov 23 '18 at 6:30

Chris

2,6082420

This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03

@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19

@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27

It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43

add a comment |

This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03

@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19

@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27

It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43

This approach is not helping me. It is dividing the dataframe into bins but each bin is having same number of points. I need to divide it into equal width where number of points can be different. Instead of using array_split there should be some other method for this. I have tried this approach before.

– Upriser
Nov 23 '18 at 7:03

@Upriser do something like np.split(df[df.colums[:5]], [1,3,6]) and append the remaining columns

– Chris
Nov 23 '18 at 7:19

@Upriser I think I understand what you are looking for now. I was confused by your attempt using array_split. See update

– Chris
Nov 23 '18 at 19:27

It worked. Thanks a lot for helping.

– Upriser
Nov 23 '18 at 20:43

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

df 6Q7dmOir,QwLCakP NGKWkKBM,OAKv,U,xpjgHEVqs,cz,9vkhBl2C4XGxT

搜尋此網誌

Ytukyg

How to divide the dataframe into bins of specific length with unequal number of points?

1 Answer
1

update

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

update

update

update

update

Post as a guest

Popular posts from this blog

Wiesbaden

27. Oktober

Sommerrodelbahn

How to divide the dataframe into bins of specific length with unequal number of points?

1 Answer 1

update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

update

update

update

update

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Wiesbaden

27. Oktober

Sommerrodelbahn

1 Answer
1

1 Answer
1

1 Answer
1