Creating a new row whenever a comma appears in the column











up vote
0
down vote

favorite












I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!










share|improve this question
























  • data already exists in a dataframe? or in a json object?
    – Harikrishna
    Nov 19 at 19:21












  • Yep! It exists already in the dataframe called data (from the csv)
    – Sonicarrow
    Nov 19 at 19:34















up vote
0
down vote

favorite












I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!










share|improve this question
























  • data already exists in a dataframe? or in a json object?
    – Harikrishna
    Nov 19 at 19:21












  • Yep! It exists already in the dataframe called data (from the csv)
    – Sonicarrow
    Nov 19 at 19:34













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!










share|improve this question















I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!







python pandas numpy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 19:14









sacul

27.9k41639




27.9k41639










asked Nov 19 at 19:13









Sonicarrow

439




439












  • data already exists in a dataframe? or in a json object?
    – Harikrishna
    Nov 19 at 19:21












  • Yep! It exists already in the dataframe called data (from the csv)
    – Sonicarrow
    Nov 19 at 19:34


















  • data already exists in a dataframe? or in a json object?
    – Harikrishna
    Nov 19 at 19:21












  • Yep! It exists already in the dataframe called data (from the csv)
    – Sonicarrow
    Nov 19 at 19:34
















data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 at 19:21






data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 at 19:21














Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 at 19:34




Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 at 19:34












2 Answers
2






active

oldest

votes

















up vote
2
down vote













If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')

>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))

>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





share|improve this answer






























    up vote
    0
    down vote













    try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



    6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



    Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381170%2fcreating-a-new-row-whenever-a-comma-appears-in-the-column%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote













      If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



      data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
      # Get rid of spaces in your column names
      data.columns = data.columns.str.strip(' ')

      >>> data
      Name location type hours
      0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


      new_data = (data.set_index(['Name', 'location', 'type'])
      .hours.str.split(',', expand=True)
      .stack()
      .reset_index(level=['Name', 'location', 'type']))

      >>> new_data
      Name location type 0
      0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
      1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
      2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





      share|improve this answer



























        up vote
        2
        down vote













        If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



        data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
        # Get rid of spaces in your column names
        data.columns = data.columns.str.strip(' ')

        >>> data
        Name location type hours
        0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


        new_data = (data.set_index(['Name', 'location', 'type'])
        .hours.str.split(',', expand=True)
        .stack()
        .reset_index(level=['Name', 'location', 'type']))

        >>> new_data
        Name location type 0
        0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
        1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
        2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





        share|improve this answer

























          up vote
          2
          down vote










          up vote
          2
          down vote









          If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



          data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
          # Get rid of spaces in your column names
          data.columns = data.columns.str.strip(' ')

          >>> data
          Name location type hours
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


          new_data = (data.set_index(['Name', 'location', 'type'])
          .hours.str.split(',', expand=True)
          .stack()
          .reset_index(level=['Name', 'location', 'type']))

          >>> new_data
          Name location type 0
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
          1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
          2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





          share|improve this answer














          If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



          data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
          # Get rid of spaces in your column names
          data.columns = data.columns.str.strip(' ')

          >>> data
          Name location type hours
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


          new_data = (data.set_index(['Name', 'location', 'type'])
          .hours.str.split(',', expand=True)
          .stack()
          .reset_index(level=['Name', 'location', 'type']))

          >>> new_data
          Name location type 0
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
          1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
          2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 19 at 19:29

























          answered Nov 19 at 19:19









          sacul

          27.9k41639




          27.9k41639
























              up vote
              0
              down vote













              try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



              6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



              Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






              share|improve this answer

























                up vote
                0
                down vote













                try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



                6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



                Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



                  6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



                  Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






                  share|improve this answer












                  try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



                  6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



                  Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 19 at 19:26









                  Sudheer Kumar R

                  11




                  11






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381170%2fcreating-a-new-row-whenever-a-comma-appears-in-the-column%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Wiesbaden

                      Marschland

                      Dieringhausen