python pandas: mean scores per hour per workday
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"
The score is an integer.
I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.
If there are no values in the hour of that workday, set the mean to 0.
Thusfar i've come to this:
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)
The result of this is:
score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19
This misses all the other hours of the week with the 0 value as mean.
Any suggestions to what i'm doing wrong?
Thanks !! :)
python pandas aggregate
add a comment |
I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"
The score is an integer.
I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.
If there are no values in the hour of that workday, set the mean to 0.
Thusfar i've come to this:
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)
The result of this is:
score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19
This misses all the other hours of the week with the 0 value as mean.
Any suggestions to what i'm doing wrong?
Thanks !! :)
python pandas aggregate
1
I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with.fillna(0).
– CJR
Nov 26 '18 at 13:59
add a comment |
I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"
The score is an integer.
I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.
If there are no values in the hour of that workday, set the mean to 0.
Thusfar i've come to this:
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)
The result of this is:
score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19
This misses all the other hours of the week with the 0 value as mean.
Any suggestions to what i'm doing wrong?
Thanks !! :)
python pandas aggregate
I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"
The score is an integer.
I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.
If there are no values in the hour of that workday, set the mean to 0.
Thusfar i've come to this:
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)
The result of this is:
score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19
This misses all the other hours of the week with the 0 value as mean.
Any suggestions to what i'm doing wrong?
Thanks !! :)
python pandas aggregate
python pandas aggregate
asked Nov 26 '18 at 13:44
socialbsocialb
233
233
1
I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with.fillna(0).
– CJR
Nov 26 '18 at 13:59
add a comment |
1
I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with.fillna(0).
– CJR
Nov 26 '18 at 13:59
1
1
I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with
.fillna(0).– CJR
Nov 26 '18 at 13:59
I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with
.fillna(0).– CJR
Nov 26 '18 at 13:59
add a comment |
1 Answer
1
active
oldest
votes
i got it:
This worked, don't know if it could have been shorter but this did the job:
- make new dataframe with 0 values for each hour of each workday.
- appended the values from the database
<
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
columns = ['user', 'survey_id', 'score']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional
if len(gdf) == 0:
return None
df_['weekday'] = df_.index.weekday
df_['hour'] = df_.index.hour
df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53482450%2fpython-pandas-mean-scores-per-hour-per-workday%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
i got it:
This worked, don't know if it could have been shorter but this did the job:
- make new dataframe with 0 values for each hour of each workday.
- appended the values from the database
<
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
columns = ['user', 'survey_id', 'score']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional
if len(gdf) == 0:
return None
df_['weekday'] = df_.index.weekday
df_['hour'] = df_.index.hour
df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)
add a comment |
i got it:
This worked, don't know if it could have been shorter but this did the job:
- make new dataframe with 0 values for each hour of each workday.
- appended the values from the database
<
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
columns = ['user', 'survey_id', 'score']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional
if len(gdf) == 0:
return None
df_['weekday'] = df_.index.weekday
df_['hour'] = df_.index.hour
df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)
add a comment |
i got it:
This worked, don't know if it could have been shorter but this did the job:
- make new dataframe with 0 values for each hour of each workday.
- appended the values from the database
<
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
columns = ['user', 'survey_id', 'score']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional
if len(gdf) == 0:
return None
df_['weekday'] = df_.index.weekday
df_['hour'] = df_.index.hour
df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)
i got it:
This worked, don't know if it could have been shorter but this did the job:
- make new dataframe with 0 values for each hour of each workday.
- appended the values from the database
<
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
columns = ['user', 'survey_id', 'score']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional
if len(gdf) == 0:
return None
df_['weekday'] = df_.index.weekday
df_['hour'] = df_.index.hour
df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)
answered Nov 27 '18 at 12:30
socialbsocialb
233
233
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53482450%2fpython-pandas-mean-scores-per-hour-per-workday%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with
.fillna(0).– CJR
Nov 26 '18 at 13:59