python pandas: mean scores per hour per workday





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"



The score is an integer.



I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.



If there are no values in the hour of that workday, set the mean to 0.



Thusfar i've come to this:



gdf = pd.read_sql("select * from scores where survey_id='{}';    ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)


The result of this is:



                                score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19


This misses all the other hours of the week with the 0 value as mean.



Any suggestions to what i'm doing wrong?



Thanks !! :)










share|improve this question


















  • 1





    I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with .fillna(0).

    – CJR
    Nov 26 '18 at 13:59




















1















I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"



The score is an integer.



I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.



If there are no values in the hour of that workday, set the mean to 0.



Thusfar i've come to this:



gdf = pd.read_sql("select * from scores where survey_id='{}';    ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)


The result of this is:



                                score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19


This misses all the other hours of the week with the 0 value as mean.



Any suggestions to what i'm doing wrong?



Thanks !! :)










share|improve this question


















  • 1





    I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with .fillna(0).

    – CJR
    Nov 26 '18 at 13:59
















1












1








1








I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"



The score is an integer.



I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.



If there are no values in the hour of that workday, set the mean to 0.



Thusfar i've come to this:



gdf = pd.read_sql("select * from scores where survey_id='{}';    ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)


The result of this is:



                                score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19


This misses all the other hours of the week with the 0 value as mean.



Any suggestions to what i'm doing wrong?



Thanks !! :)










share|improve this question














I have a database that has votes in it. These vote records consist of:
"Timestamp ; score"



The score is an integer.



I want to create a heatmap, so i want to have a dataframe with values for each hour in every workday with a mean score of all scores in that timeframe.



If there are no values in the hour of that workday, set the mean to 0.



Thusfar i've come to this:



gdf = pd.read_sql("select * from scores where survey_id='{}';    ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
if len(gdf) == 0:
return None
gdf['weekday'] = gdf.index.weekday
# gdf['hour'] = gdf.index.hour
gdf = gdf.groupby(by=[gdf['weekday'], pd.Grouper(freq='H')]).agg(['mean']).fillna(0)


The result of this is:



                                score weekday hour
mean mean mean
weekday time_stamp
0 2018-10-22 17:00:00 1.600000 0 17
1 2018-10-23 09:00:00 2.666667 1 9
2 2018-10-31 14:00:00 3.000000 2 14
2018-10-31 19:00:00 4.000000 2 19


This misses all the other hours of the week with the 0 value as mean.



Any suggestions to what i'm doing wrong?



Thanks !! :)







python pandas aggregate






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '18 at 13:44









socialbsocialb

233




233








  • 1





    I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with .fillna(0).

    – CJR
    Nov 26 '18 at 13:59
















  • 1





    I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with .fillna(0).

    – CJR
    Nov 26 '18 at 13:59










1




1





I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with .fillna(0).

– CJR
Nov 26 '18 at 13:59







I think I'd do mostly what you've done (breaking the time_stamp out to days/hours, then grouping and getting the mean). Then I'd generate a dataframe with every combination of days/hours and left join my mean data onto it, so that all the missed rows are NA and I can fill them with .fillna(0).

– CJR
Nov 26 '18 at 13:59














1 Answer
1






active

oldest

votes


















0














i got it:



This worked, don't know if it could have been shorter but this did the job:




  • make new dataframe with 0 values for each hour of each workday.

  • appended the values from the database


<



    todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
columns = ['user', 'survey_id', 'score']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
gdf = gdf.set_index(['time_stamp'])
gdf.index = pd.to_datetime(gdf.index, unit='s')
df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional

if len(gdf) == 0:
return None
df_['weekday'] = df_.index.weekday
df_['hour'] = df_.index.hour
df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53482450%2fpython-pandas-mean-scores-per-hour-per-workday%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    i got it:



    This worked, don't know if it could have been shorter but this did the job:




    • make new dataframe with 0 values for each hour of each workday.

    • appended the values from the database


    <



        todays_date = datetime.datetime.now().date()
    index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
    columns = ['user', 'survey_id', 'score']
    df_ = pd.DataFrame(index=index, columns=columns)
    df_ = df_.fillna(0) # with 0s rather than NaNs
    gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
    gdf = gdf.set_index(['time_stamp'])
    gdf.index = pd.to_datetime(gdf.index, unit='s')
    df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional

    if len(gdf) == 0:
    return None
    df_['weekday'] = df_.index.weekday
    df_['hour'] = df_.index.hour
    df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)





    share|improve this answer




























      0














      i got it:



      This worked, don't know if it could have been shorter but this did the job:




      • make new dataframe with 0 values for each hour of each workday.

      • appended the values from the database


      <



          todays_date = datetime.datetime.now().date()
      index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
      columns = ['user', 'survey_id', 'score']
      df_ = pd.DataFrame(index=index, columns=columns)
      df_ = df_.fillna(0) # with 0s rather than NaNs
      gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
      gdf = gdf.set_index(['time_stamp'])
      gdf.index = pd.to_datetime(gdf.index, unit='s')
      df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional

      if len(gdf) == 0:
      return None
      df_['weekday'] = df_.index.weekday
      df_['hour'] = df_.index.hour
      df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)





      share|improve this answer


























        0












        0








        0







        i got it:



        This worked, don't know if it could have been shorter but this did the job:




        • make new dataframe with 0 values for each hour of each workday.

        • appended the values from the database


        <



            todays_date = datetime.datetime.now().date()
        index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
        columns = ['user', 'survey_id', 'score']
        df_ = pd.DataFrame(index=index, columns=columns)
        df_ = df_.fillna(0) # with 0s rather than NaNs
        gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
        gdf = gdf.set_index(['time_stamp'])
        gdf.index = pd.to_datetime(gdf.index, unit='s')
        df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional

        if len(gdf) == 0:
        return None
        df_['weekday'] = df_.index.weekday
        df_['hour'] = df_.index.hour
        df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)





        share|improve this answer













        i got it:



        This worked, don't know if it could have been shorter but this did the job:




        • make new dataframe with 0 values for each hour of each workday.

        • appended the values from the database


        <



            todays_date = datetime.datetime.now().date()
        index = pd.date_range(todays_date - datetime.timedelta(7), periods=7*24, freq='H')
        columns = ['user', 'survey_id', 'score']
        df_ = pd.DataFrame(index=index, columns=columns)
        df_ = df_.fillna(0) # with 0s rather than NaNs
        gdf = pd.read_sql("select * from scores where survey_id='{}'; ".format(survey_id), self.db_conn)
        gdf = gdf.set_index(['time_stamp'])
        gdf.index = pd.to_datetime(gdf.index, unit='s')
        df_ = df_.append(gdf, ignore_index=False) # ignoring index is optional

        if len(gdf) == 0:
        return None
        df_['weekday'] = df_.index.weekday
        df_['hour'] = df_.index.hour
        df_ = df_.groupby(by=[df_['weekday'], df_['hour']]).agg(['mean']).fillna(0)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 27 '18 at 12:30









        socialbsocialb

        233




        233
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53482450%2fpython-pandas-mean-scores-per-hour-per-workday%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Tonle Sap (See)

            I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

            Guatemaltekische Davis-Cup-Mannschaft