pandas read_html ValueError: No tables found
I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:
import pandas as pd
page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)
I have the following response:
Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found
Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")
head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')
list_rows =
for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()
Now, the problem is that it cannot find "tr". I would appreciate any suggestions.
python html pandas parsing web-scraping
add a comment |
I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:
import pandas as pd
page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)
I have the following response:
Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found
Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")
head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')
list_rows =
for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()
Now, the problem is that it cannot find "tr". I would appreciate any suggestions.
python html pandas parsing web-scraping
1
The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11
Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11
Different selectordf=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
See my answer posted below
– G. Anderson
Nov 20 at 20:28
add a comment |
I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:
import pandas as pd
page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)
I have the following response:
Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found
Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")
head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')
list_rows =
for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()
Now, the problem is that it cannot find "tr". I would appreciate any suggestions.
python html pandas parsing web-scraping
I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:
import pandas as pd
page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)
I have the following response:
Traceback (most recent call last):
File "weather_station_scrapping.py", line 11, in <module>
result = pd.read_html(page_link)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
displayed_only=displayed_only)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found
Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")
head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')
list_rows =
for items in body.find_element_by_tag_name('tr'):
list_cells =
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close()
Now, the problem is that it cannot find "tr". I would appreciate any suggestions.
python html pandas parsing web-scraping
python html pandas parsing web-scraping
edited Nov 20 at 20:10
asked Nov 20 at 17:53
Noman Bashir
134
134
1
The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11
Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11
Different selectordf=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
See my answer posted below
– G. Anderson
Nov 20 at 20:28
add a comment |
1
The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11
Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11
Different selectordf=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
See my answer posted below
– G. Anderson
Nov 20 at 20:28
1
1
The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11
The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11
Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11
Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11
Different selector
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
See my answer posted below– G. Anderson
Nov 20 at 20:28
Different selector
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
See my answer posted below– G. Anderson
Nov 20 at 20:28
add a comment |
2 Answers
2
active
oldest
votes
You can use requests
and avoid opening browser.
You can get current conditions by using:
https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15
and strip of 'jQuery1720724027235122559_1542743885014('
from the left and ')'
from the right. Then handle the json string.
You can get summary and history by calling the API with the following
https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276
You then need to strip 'jQuery1720724027235122559_1542743885015('
from the front and ');'
from the right. You then have a JSON string you can parse.
Sample of JSON:
You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.
An example for current
, noting there seems to be a problem with nulls
in the JSON so I am replacing with "placeholder"
:
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)
add a comment |
Here's a solution using selenium for browser automation
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:
After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)
tab=driver.find_element_by_id("history_table")
Then, from that element, we get the HTML instead of the web driver element object
tab_html=tab.get_attribute('outerHTML')
We use pandas to parse the html
tab_dfs=pd.read_html(tab_html)
From the docs:
"read_html returns a list of DataFrame objects, even if there is only
a single table contained in the HTML content"
So we index into that list with the only table we have, at index zero
df=tab_dfs[0]
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398785%2fpandas-read-html-valueerror-no-tables-found%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use requests
and avoid opening browser.
You can get current conditions by using:
https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15
and strip of 'jQuery1720724027235122559_1542743885014('
from the left and ')'
from the right. Then handle the json string.
You can get summary and history by calling the API with the following
https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276
You then need to strip 'jQuery1720724027235122559_1542743885015('
from the front and ');'
from the right. You then have a JSON string you can parse.
Sample of JSON:
You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.
An example for current
, noting there seems to be a problem with nulls
in the JSON so I am replacing with "placeholder"
:
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)
add a comment |
You can use requests
and avoid opening browser.
You can get current conditions by using:
https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15
and strip of 'jQuery1720724027235122559_1542743885014('
from the left and ')'
from the right. Then handle the json string.
You can get summary and history by calling the API with the following
https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276
You then need to strip 'jQuery1720724027235122559_1542743885015('
from the front and ');'
from the right. You then have a JSON string you can parse.
Sample of JSON:
You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.
An example for current
, noting there seems to be a problem with nulls
in the JSON so I am replacing with "placeholder"
:
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)
add a comment |
You can use requests
and avoid opening browser.
You can get current conditions by using:
https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15
and strip of 'jQuery1720724027235122559_1542743885014('
from the left and ')'
from the right. Then handle the json string.
You can get summary and history by calling the API with the following
https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276
You then need to strip 'jQuery1720724027235122559_1542743885015('
from the front and ');'
from the right. You then have a JSON string you can parse.
Sample of JSON:
You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.
An example for current
, noting there seems to be a problem with nulls
in the JSON so I am replacing with "placeholder"
:
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)
You can use requests
and avoid opening browser.
You can get current conditions by using:
https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15
and strip of 'jQuery1720724027235122559_1542743885014('
from the left and ')'
from the right. Then handle the json string.
You can get summary and history by calling the API with the following
https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276
You then need to strip 'jQuery1720724027235122559_1542743885015('
from the front and ');'
from the right. You then have a JSON string you can parse.
Sample of JSON:
You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.
An example for current
, noting there seems to be a problem with nulls
in the JSON so I am replacing with "placeholder"
:
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)
edited Nov 20 at 20:36
answered Nov 20 at 20:19
QHarr
29.5k81841
29.5k81841
add a comment |
add a comment |
Here's a solution using selenium for browser automation
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:
After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)
tab=driver.find_element_by_id("history_table")
Then, from that element, we get the HTML instead of the web driver element object
tab_html=tab.get_attribute('outerHTML')
We use pandas to parse the html
tab_dfs=pd.read_html(tab_html)
From the docs:
"read_html returns a list of DataFrame objects, even if there is only
a single table contained in the HTML content"
So we index into that list with the only table we have, at index zero
df=tab_dfs[0]
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
add a comment |
Here's a solution using selenium for browser automation
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:
After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)
tab=driver.find_element_by_id("history_table")
Then, from that element, we get the HTML instead of the web driver element object
tab_html=tab.get_attribute('outerHTML')
We use pandas to parse the html
tab_dfs=pd.read_html(tab_html)
From the docs:
"read_html returns a list of DataFrame objects, even if there is only
a single table contained in the HTML content"
So we index into that list with the only table we have, at index zero
df=tab_dfs[0]
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
add a comment |
Here's a solution using selenium for browser automation
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:
After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)
tab=driver.find_element_by_id("history_table")
Then, from that element, we get the HTML instead of the web driver element object
tab_html=tab.get_attribute('outerHTML')
We use pandas to parse the html
tab_dfs=pd.read_html(tab_html)
From the docs:
"read_html returns a list of DataFrame objects, even if there is only
a single table contained in the HTML content"
So we index into that list with the only table we have, at index zero
df=tab_dfs[0]
Here's a solution using selenium for browser automation
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m²
3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m²
Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:
After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)
tab=driver.find_element_by_id("history_table")
Then, from that element, we get the HTML instead of the web driver element object
tab_html=tab.get_attribute('outerHTML')
We use pandas to parse the html
tab_dfs=pd.read_html(tab_html)
From the docs:
"read_html returns a list of DataFrame objects, even if there is only
a single table contained in the HTML content"
So we index into that list with the only table we have, at index zero
df=tab_dfs[0]
edited Nov 20 at 21:10
answered Nov 20 at 20:31
G. Anderson
1,06929
1,06929
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
add a comment |
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Hi, thanks a lot. This works wonders, but I would highly appreciate if you would shed a little light on why did we select an attribute and picked the value at index 0.
– Noman Bashir
Nov 20 at 20:42
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Edited with breakdown
– G. Anderson
Nov 20 at 21:10
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
Thanks a lot. It was really helpful.
– Noman Bashir
Nov 20 at 21:32
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398785%2fpandas-read-html-valueerror-no-tables-found%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it
– G. Anderson
Nov 20 at 18:11
Hi, I have tried using Selenium but I am still facing issues. Would you mind taking a look at my edit and suggest any suggestions if possible.
– Noman Bashir
Nov 20 at 20:11
Different selector
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
See my answer posted below– G. Anderson
Nov 20 at 20:28