pandas read_html ValueError: No tables found












1














I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:



import pandas as pd 

page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)


I have the following response:



Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found


Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.



from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows =

for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()


Now, the problem is that it cannot find "tr". I would appreciate any suggestions.










share|improve this question




















  • 1




    The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
    – G. Anderson
    Nov 20 at 18:11










  • Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
    – Noman Bashir
    Nov 20 at 20:11










  • Different selector df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0] See my answer posted below
    – G. Anderson
    Nov 20 at 20:28


















1














I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:



import pandas as pd 

page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)


I have the following response:



Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found


Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.



from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows =

for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()


Now, the problem is that it cannot find "tr". I would appreciate any suggestions.










share|improve this question




















  • 1




    The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
    – G. Anderson
    Nov 20 at 18:11










  • Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
    – Noman Bashir
    Nov 20 at 20:11










  • Different selector df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0] See my answer posted below
    – G. Anderson
    Nov 20 at 20:28
















1












1








1







I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:



import pandas as pd 

page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)


I have the following response:



Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found


Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.



from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows =

for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()


Now, the problem is that it cannot find "tr". I would appreciate any suggestions.










share|improve this question















I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:



import pandas as pd 

page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)


I have the following response:



Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found


Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.



from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows =

for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()


Now, the problem is that it cannot find "tr". I would appreciate any suggestions.







python html pandas parsing web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 20:10

























asked Nov 20 at 17:53









Noman Bashir

134




134








  • 1




    The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
    – G. Anderson
    Nov 20 at 18:11










  • Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
    – Noman Bashir
    Nov 20 at 20:11










  • Different selector df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0] See my answer posted below
    – G. Anderson
    Nov 20 at 20:28
















  • 1




    The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
    – G. Anderson
    Nov 20 at 18:11










  • Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
    – Noman Bashir
    Nov 20 at 20:11










  • Different selector df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0] See my answer posted below
    – G. Anderson
    Nov 20 at 20:28










1




1




The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11




The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11












Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11




Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11












Different selector df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0] See my answer posted below
– G. Anderson
Nov 20 at 20:28






Different selector df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0] See my answer posted below
– G. Anderson
Nov 20 at 20:28














2 Answers
2






active

oldest

votes


















1














You can use requests and avoid opening browser.



You can get current conditions by using:



https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15



and strip of 'jQuery1720724027235122559_1542743885014(' from the left and ')' from the right. Then handle the json string.



You can get summary and history by calling the API with the following



https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276



You then need to strip 'jQuery1720724027235122559_1542743885015(' from the front and ');' from the right. You then have a JSON string you can parse.



Sample of JSON:





You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.



An example for current, noting there seems to be a problem with nulls in the JSON so I am replacing with "placeholder":



import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup

url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)





share|improve this answer































    0














    Here's a solution using selenium for browser automation



    from selenium import webdriver
    import pandas as pd
    driver = webdriver.Chrome(chromedriver)
    driver.implicitly_wait(30)

    driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
    df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]

    Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
    0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
    1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
    2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
    3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
    4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²


    Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:



    After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)



    tab=driver.find_element_by_id("history_table")


    Then, from that element, we get the HTML instead of the web driver element object



    tab_html=tab.get_attribute('outerHTML')


    We use pandas to parse the html



    tab_dfs=pd.read_html(tab_html)


    From the docs:




    "read_html returns a list of DataFrame objects, even if there is only
    a single table contained in the HTML content"




    So we index into that list with the only table we have, at index zero



    df=tab_dfs[0]





    share|improve this answer























    • Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
      – Noman Bashir
      Nov 20 at 20:42












    • Edited with breakdown
      – G. Anderson
      Nov 20 at 21:10










    • Thanks a lot. It was really helpful.
      – Noman Bashir
      Nov 20 at 21:32











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398785%2fpandas-read-html-valueerror-no-tables-found%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You can use requests and avoid opening browser.



    You can get current conditions by using:



    https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15



    and strip of 'jQuery1720724027235122559_1542743885014(' from the left and ')' from the right. Then handle the json string.



    You can get summary and history by calling the API with the following



    https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276



    You then need to strip 'jQuery1720724027235122559_1542743885015(' from the front and ');' from the right. You then have a JSON string you can parse.



    Sample of JSON:





    You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.



    An example for current, noting there seems to be a problem with nulls in the JSON so I am replacing with "placeholder":



    import requests
    import pandas as pd
    import json
    from pandas.io.json import json_normalize
    from bs4 import BeautifulSoup

    url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
    res = requests.get(url)
    soup = BeautifulSoup(res.content, "lxml")
    s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
    s = s.replace('null','"placeholder"')
    data= json.loads(s)
    data = json_normalize(data)
    df = pd.DataFrame(data)
    print(df)





    share|improve this answer




























      1














      You can use requests and avoid opening browser.



      You can get current conditions by using:



      https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15



      and strip of 'jQuery1720724027235122559_1542743885014(' from the left and ')' from the right. Then handle the json string.



      You can get summary and history by calling the API with the following



      https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276



      You then need to strip 'jQuery1720724027235122559_1542743885015(' from the front and ');' from the right. You then have a JSON string you can parse.



      Sample of JSON:





      You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.



      An example for current, noting there seems to be a problem with nulls in the JSON so I am replacing with "placeholder":



      import requests
      import pandas as pd
      import json
      from pandas.io.json import json_normalize
      from bs4 import BeautifulSoup

      url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
      res = requests.get(url)
      soup = BeautifulSoup(res.content, "lxml")
      s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
      s = s.replace('null','"placeholder"')
      data= json.loads(s)
      data = json_normalize(data)
      df = pd.DataFrame(data)
      print(df)





      share|improve this answer


























        1












        1








        1






        You can use requests and avoid opening browser.



        You can get current conditions by using:



        https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15



        and strip of 'jQuery1720724027235122559_1542743885014(' from the left and ')' from the right. Then handle the json string.



        You can get summary and history by calling the API with the following



        https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276



        You then need to strip 'jQuery1720724027235122559_1542743885015(' from the front and ');' from the right. You then have a JSON string you can parse.



        Sample of JSON:





        You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.



        An example for current, noting there seems to be a problem with nulls in the JSON so I am replacing with "placeholder":



        import requests
        import pandas as pd
        import json
        from pandas.io.json import json_normalize
        from bs4 import BeautifulSoup

        url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
        res = requests.get(url)
        soup = BeautifulSoup(res.content, "lxml")
        s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
        s = s.replace('null','"placeholder"')
        data= json.loads(s)
        data = json_normalize(data)
        df = pd.DataFrame(data)
        print(df)





        share|improve this answer














        You can use requests and avoid opening browser.



        You can get current conditions by using:



        https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15



        and strip of 'jQuery1720724027235122559_1542743885014(' from the left and ')' from the right. Then handle the json string.



        You can get summary and history by calling the API with the following



        https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276



        You then need to strip 'jQuery1720724027235122559_1542743885015(' from the front and ');' from the right. You then have a JSON string you can parse.



        Sample of JSON:





        You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.



        An example for current, noting there seems to be a problem with nulls in the JSON so I am replacing with "placeholder":



        import requests
        import pandas as pd
        import json
        from pandas.io.json import json_normalize
        from bs4 import BeautifulSoup

        url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
        res = requests.get(url)
        soup = BeautifulSoup(res.content, "lxml")
        s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
        s = s.replace('null','"placeholder"')
        data= json.loads(s)
        data = json_normalize(data)
        df = pd.DataFrame(data)
        print(df)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 20 at 20:36

























        answered Nov 20 at 20:19









        QHarr

        29.5k81841




        29.5k81841

























            0














            Here's a solution using selenium for browser automation



            from selenium import webdriver
            import pandas as pd
            driver = webdriver.Chrome(chromedriver)
            driver.implicitly_wait(30)

            driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
            df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]

            Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
            0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²


            Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:



            After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)



            tab=driver.find_element_by_id("history_table")


            Then, from that element, we get the HTML instead of the web driver element object



            tab_html=tab.get_attribute('outerHTML')


            We use pandas to parse the html



            tab_dfs=pd.read_html(tab_html)


            From the docs:




            "read_html returns a list of DataFrame objects, even if there is only
            a single table contained in the HTML content"




            So we index into that list with the only table we have, at index zero



            df=tab_dfs[0]





            share|improve this answer























            • Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
              – Noman Bashir
              Nov 20 at 20:42












            • Edited with breakdown
              – G. Anderson
              Nov 20 at 21:10










            • Thanks a lot. It was really helpful.
              – Noman Bashir
              Nov 20 at 21:32
















            0














            Here's a solution using selenium for browser automation



            from selenium import webdriver
            import pandas as pd
            driver = webdriver.Chrome(chromedriver)
            driver.implicitly_wait(30)

            driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
            df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]

            Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
            0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²


            Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:



            After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)



            tab=driver.find_element_by_id("history_table")


            Then, from that element, we get the HTML instead of the web driver element object



            tab_html=tab.get_attribute('outerHTML')


            We use pandas to parse the html



            tab_dfs=pd.read_html(tab_html)


            From the docs:




            "read_html returns a list of DataFrame objects, even if there is only
            a single table contained in the HTML content"




            So we index into that list with the only table we have, at index zero



            df=tab_dfs[0]





            share|improve this answer























            • Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
              – Noman Bashir
              Nov 20 at 20:42












            • Edited with breakdown
              – G. Anderson
              Nov 20 at 21:10










            • Thanks a lot. It was really helpful.
              – Noman Bashir
              Nov 20 at 21:32














            0












            0








            0






            Here's a solution using selenium for browser automation



            from selenium import webdriver
            import pandas as pd
            driver = webdriver.Chrome(chromedriver)
            driver.implicitly_wait(30)

            driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
            df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]

            Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
            0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²


            Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:



            After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)



            tab=driver.find_element_by_id("history_table")


            Then, from that element, we get the HTML instead of the web driver element object



            tab_html=tab.get_attribute('outerHTML')


            We use pandas to parse the html



            tab_dfs=pd.read_html(tab_html)


            From the docs:




            "read_html returns a list of DataFrame objects, even if there is only
            a single table contained in the HTML content"




            So we index into that list with the only table we have, at index zero



            df=tab_dfs[0]





            share|improve this answer














            Here's a solution using selenium for browser automation



            from selenium import webdriver
            import pandas as pd
            driver = webdriver.Chrome(chromedriver)
            driver.implicitly_wait(30)

            driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
            df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]

            Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
            0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
            3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
            4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²


            Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:



            After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)



            tab=driver.find_element_by_id("history_table")


            Then, from that element, we get the HTML instead of the web driver element object



            tab_html=tab.get_attribute('outerHTML')


            We use pandas to parse the html



            tab_dfs=pd.read_html(tab_html)


            From the docs:




            "read_html returns a list of DataFrame objects, even if there is only
            a single table contained in the HTML content"




            So we index into that list with the only table we have, at index zero



            df=tab_dfs[0]






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 20 at 21:10

























            answered Nov 20 at 20:31









            G. Anderson

            1,06929




            1,06929












            • Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
              – Noman Bashir
              Nov 20 at 20:42












            • Edited with breakdown
              – G. Anderson
              Nov 20 at 21:10










            • Thanks a lot. It was really helpful.
              – Noman Bashir
              Nov 20 at 21:32


















            • Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
              – Noman Bashir
              Nov 20 at 20:42












            • Edited with breakdown
              – G. Anderson
              Nov 20 at 21:10










            • Thanks a lot. It was really helpful.
              – Noman Bashir
              Nov 20 at 21:32
















            Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
            – Noman Bashir
            Nov 20 at 20:42






            Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
            – Noman Bashir
            Nov 20 at 20:42














            Edited with breakdown
            – G. Anderson
            Nov 20 at 21:10




            Edited with breakdown
            – G. Anderson
            Nov 20 at 21:10












            Thanks a lot. It was really helpful.
            – Noman Bashir
            Nov 20 at 21:32




            Thanks a lot. It was really helpful.
            – Noman Bashir
            Nov 20 at 21:32


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398785%2fpandas-read-html-valueerror-no-tables-found%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen