How to get that contains text which matches regex
I am trying to scrape this website using scrapy, xpath and regex.
I have checked and tried the answers to this question:
xpath+ regex: matches text
I want to create a 'scrapy.selector.unified.SelectorList' of <p>
that contain the text "11 (sun)" or "9 (fri)" and such, and loop through the list.
event = response.xpath('//p[matches(text(), "d+s(w{3})")]').extract()
does not work.
FYI, below does work.
event = response.xpath('//p[contains(text(), "11 (sun)")]').extract()
What am I missing here?
python regex xpath scrapy
add a comment |
I am trying to scrape this website using scrapy, xpath and regex.
I have checked and tried the answers to this question:
xpath+ regex: matches text
I want to create a 'scrapy.selector.unified.SelectorList' of <p>
that contain the text "11 (sun)" or "9 (fri)" and such, and loop through the list.
event = response.xpath('//p[matches(text(), "d+s(w{3})")]').extract()
does not work.
FYI, below does work.
event = response.xpath('//p[contains(text(), "11 (sun)")]').extract()
What am I missing here?
python regex xpath scrapy
Trymatches(text(), ".*[0-9] ([a-zA-Z]{3}).*")
– Wiktor Stribiżew
Nov 21 '18 at 9:39
Thanks, but not working for me and gets the error as below. ValueError: XPath error: Unregistered function in //p[matches(text(), ".*d+s([a-zA-Z]{3}). *")]
– deekay
Nov 21 '18 at 9:42
See this thread, it may help.
– Wiktor Stribiżew
Nov 21 '18 at 9:47
add a comment |
I am trying to scrape this website using scrapy, xpath and regex.
I have checked and tried the answers to this question:
xpath+ regex: matches text
I want to create a 'scrapy.selector.unified.SelectorList' of <p>
that contain the text "11 (sun)" or "9 (fri)" and such, and loop through the list.
event = response.xpath('//p[matches(text(), "d+s(w{3})")]').extract()
does not work.
FYI, below does work.
event = response.xpath('//p[contains(text(), "11 (sun)")]').extract()
What am I missing here?
python regex xpath scrapy
I am trying to scrape this website using scrapy, xpath and regex.
I have checked and tried the answers to this question:
xpath+ regex: matches text
I want to create a 'scrapy.selector.unified.SelectorList' of <p>
that contain the text "11 (sun)" or "9 (fri)" and such, and loop through the list.
event = response.xpath('//p[matches(text(), "d+s(w{3})")]').extract()
does not work.
FYI, below does work.
event = response.xpath('//p[contains(text(), "11 (sun)")]').extract()
What am I missing here?
python regex xpath scrapy
python regex xpath scrapy
edited Nov 23 '18 at 8:15
asked Nov 21 '18 at 9:37
deekay
103
103
Trymatches(text(), ".*[0-9] ([a-zA-Z]{3}).*")
– Wiktor Stribiżew
Nov 21 '18 at 9:39
Thanks, but not working for me and gets the error as below. ValueError: XPath error: Unregistered function in //p[matches(text(), ".*d+s([a-zA-Z]{3}). *")]
– deekay
Nov 21 '18 at 9:42
See this thread, it may help.
– Wiktor Stribiżew
Nov 21 '18 at 9:47
add a comment |
Trymatches(text(), ".*[0-9] ([a-zA-Z]{3}).*")
– Wiktor Stribiżew
Nov 21 '18 at 9:39
Thanks, but not working for me and gets the error as below. ValueError: XPath error: Unregistered function in //p[matches(text(), ".*d+s([a-zA-Z]{3}). *")]
– deekay
Nov 21 '18 at 9:42
See this thread, it may help.
– Wiktor Stribiżew
Nov 21 '18 at 9:47
Try
matches(text(), ".*[0-9] ([a-zA-Z]{3}).*")
– Wiktor Stribiżew
Nov 21 '18 at 9:39
Try
matches(text(), ".*[0-9] ([a-zA-Z]{3}).*")
– Wiktor Stribiżew
Nov 21 '18 at 9:39
Thanks, but not working for me and gets the error as below. ValueError: XPath error: Unregistered function in //p[matches(text(), ".*d+s([a-zA-Z]{3}). *")]
– deekay
Nov 21 '18 at 9:42
Thanks, but not working for me and gets the error as below. ValueError: XPath error: Unregistered function in //p[matches(text(), ".*d+s([a-zA-Z]{3}). *")]
– deekay
Nov 21 '18 at 9:42
See this thread, it may help.
– Wiktor Stribiżew
Nov 21 '18 at 9:47
See this thread, it may help.
– Wiktor Stribiżew
Nov 21 '18 at 9:47
add a comment |
2 Answers
2
active
oldest
votes
If you're only after text, Karan Verma's answer is sufficient.
If you're after the elements themselves, keep reading.
matches
is only available in XPath 2.0 and higher (as are the other regex functions), and is not available in scrapy.
Scrapy uses parsel for parsing, which in turn uses lxml, which only supports XPath 1.0.
It does, however, support regular expressions in the EXSLT namespace
Since the regex namespace is enabled by default in scrapy, you can do this:
event = response.xpath('//p[re:match(text(), "d+s(w{3})")]')
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
add a comment |
You can use re()
instead of extract()
Call the .re() method for each element in this list and return their results flattened, as a list of unicode strings.
.re() returns a list of unicode strings. So you can’t construct nested .re() calls.
event = response.xpath('//p/text()').extract("d+s(w{3})")
Note: re() decode HTML entities (except < and &).
For more information please refer doc here : https://doc.scrapy.org/en/latest/topics/selectors.html#scrapy.selector.SelectorList.re
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53409077%2fhow-to-get-p-that-contains-text-which-matches-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you're only after text, Karan Verma's answer is sufficient.
If you're after the elements themselves, keep reading.
matches
is only available in XPath 2.0 and higher (as are the other regex functions), and is not available in scrapy.
Scrapy uses parsel for parsing, which in turn uses lxml, which only supports XPath 1.0.
It does, however, support regular expressions in the EXSLT namespace
Since the regex namespace is enabled by default in scrapy, you can do this:
event = response.xpath('//p[re:match(text(), "d+s(w{3})")]')
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
add a comment |
If you're only after text, Karan Verma's answer is sufficient.
If you're after the elements themselves, keep reading.
matches
is only available in XPath 2.0 and higher (as are the other regex functions), and is not available in scrapy.
Scrapy uses parsel for parsing, which in turn uses lxml, which only supports XPath 1.0.
It does, however, support regular expressions in the EXSLT namespace
Since the regex namespace is enabled by default in scrapy, you can do this:
event = response.xpath('//p[re:match(text(), "d+s(w{3})")]')
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
add a comment |
If you're only after text, Karan Verma's answer is sufficient.
If you're after the elements themselves, keep reading.
matches
is only available in XPath 2.0 and higher (as are the other regex functions), and is not available in scrapy.
Scrapy uses parsel for parsing, which in turn uses lxml, which only supports XPath 1.0.
It does, however, support regular expressions in the EXSLT namespace
Since the regex namespace is enabled by default in scrapy, you can do this:
event = response.xpath('//p[re:match(text(), "d+s(w{3})")]')
If you're only after text, Karan Verma's answer is sufficient.
If you're after the elements themselves, keep reading.
matches
is only available in XPath 2.0 and higher (as are the other regex functions), and is not available in scrapy.
Scrapy uses parsel for parsing, which in turn uses lxml, which only supports XPath 1.0.
It does, however, support regular expressions in the EXSLT namespace
Since the regex namespace is enabled by default in scrapy, you can do this:
event = response.xpath('//p[re:match(text(), "d+s(w{3})")]')
answered Nov 21 '18 at 16:35
stranac
13.6k31724
13.6k31724
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
add a comment |
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Thank you stranac for the answer. This seems the answer I was looking for, but it returns empty list. Regex seems not matching the text I'm targeting. If I use ".*" it returns all potential <p>. Any advice on the regex to grab 11 (sun), 12 (mon), 13 (tue) and such? Thanks in advance.
– deekay
Nov 23 '18 at 8:24
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
Apologies, I was wrong with the scrapy shell url, forgot to include $(date + %Y%m) to get YYYYMM strings in the path. It worked just fine. Thanks for the great answer.
– deekay
Dec 4 '18 at 9:05
add a comment |
You can use re()
instead of extract()
Call the .re() method for each element in this list and return their results flattened, as a list of unicode strings.
.re() returns a list of unicode strings. So you can’t construct nested .re() calls.
event = response.xpath('//p/text()').extract("d+s(w{3})")
Note: re() decode HTML entities (except < and &).
For more information please refer doc here : https://doc.scrapy.org/en/latest/topics/selectors.html#scrapy.selector.SelectorList.re
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
add a comment |
You can use re()
instead of extract()
Call the .re() method for each element in this list and return their results flattened, as a list of unicode strings.
.re() returns a list of unicode strings. So you can’t construct nested .re() calls.
event = response.xpath('//p/text()').extract("d+s(w{3})")
Note: re() decode HTML entities (except < and &).
For more information please refer doc here : https://doc.scrapy.org/en/latest/topics/selectors.html#scrapy.selector.SelectorList.re
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
add a comment |
You can use re()
instead of extract()
Call the .re() method for each element in this list and return their results flattened, as a list of unicode strings.
.re() returns a list of unicode strings. So you can’t construct nested .re() calls.
event = response.xpath('//p/text()').extract("d+s(w{3})")
Note: re() decode HTML entities (except < and &).
For more information please refer doc here : https://doc.scrapy.org/en/latest/topics/selectors.html#scrapy.selector.SelectorList.re
You can use re()
instead of extract()
Call the .re() method for each element in this list and return their results flattened, as a list of unicode strings.
.re() returns a list of unicode strings. So you can’t construct nested .re() calls.
event = response.xpath('//p/text()').extract("d+s(w{3})")
Note: re() decode HTML entities (except < and &).
For more information please refer doc here : https://doc.scrapy.org/en/latest/topics/selectors.html#scrapy.selector.SelectorList.re
answered Nov 21 '18 at 10:18
Karan Verma
6325
6325
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
add a comment |
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
Thanks for the input, but like stranac mentioned, I am after the elements 'scrapy.selector.unified.SelectorList'. I've modified my question.
– deekay
Nov 23 '18 at 8:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53409077%2fhow-to-get-p-that-contains-text-which-matches-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Try
matches(text(), ".*[0-9] ([a-zA-Z]{3}).*")
– Wiktor Stribiżew
Nov 21 '18 at 9:39
Thanks, but not working for me and gets the error as below. ValueError: XPath error: Unregistered function in //p[matches(text(), ".*d+s([a-zA-Z]{3}). *")]
– deekay
Nov 21 '18 at 9:42
See this thread, it may help.
– Wiktor Stribiżew
Nov 21 '18 at 9:47