Read the statement by Michael Teeuw here.
Parse HTML String
-
Currently I try to create my first little module and could need a hint from an experienced person.
In mynode_helper.js
I request the html code from an simple website by using:var options = {url: URL}; request(options, (error, response, body) => { if (response.statusCode === 200) { this.sendSocketNotification("DATA", this.parseData(body));
So in the variable body there is something like
<head> ... </head> <body><a name="top"></a> <div id="data_1"> <p>The relevant data 1</p> </div> <div id="data_2"> <p>The relevant data 2</p> </div> ...
My question is: What would you recommend to get the relevant data from this body into an local variable?
I first thought to use javascript & regex, but I guess that is not a good way to solve the issue, isn’t it?Best regards,
alihallo -
I could be wrong but I believe you could request it using NPM request…
-
Thank you for your answer, but I coudn’t find out how to use the NPM request to parse the html code.
But I found another solution to solve the issue:https://github.com/cheeriojs/cheerio
This way I could get the data out of the html code like this:
var options = {url: URL}; request(options, (error, response, body) => { if (response.statusCode === 200) { this.sendSocketNotification("DATA", this.parseData(body)); ... var $ = cheerio.load(body, { normalizeWhitespace: true, xmlMode: false }); $('div[class=data_1]').find('p').each(function (index, element) { data_array.push($(element).text()); });
This way I could solve my problem.
Best regards,
alihallo -
I think it’s weird that this works, because your looking for attribute class = data_1 but it’s an id
the css selector for an id is
#
, and when you putp
behind it will look for paragraphs in the element with the id data_1when you replace
$('div[class=data_1]').find('p').each(function (index, element) { data_array.push($(element).text()); });
with
data_array.push($('#data_1 p').text());
does it still work? Not sure if it will return the element if just one occurance is found or will return an array anyways
-
If your input html file is fairly simple, you may be able to avoid the use of the cheerio library entirely
// an array to hold the data from the file var data_array = []; // Get all p tag elements inside div tag elements with an id that starts with 'data_' var data_tags = body.querySelectorAll('div[id^=data_] p'); // Loop through data tags and add content to data array for (i = 0; i < data_tags.length; i++) { data_array.push(data_tags[i].innerHTML); }
Of course the more complex your input file is the more you might benefit from the use of cheerio.
-
I want to create a module that gets data from a website in div id.
Example:
Website code:
<b class="b2 nieb" title="Kurs EUR na żywo" id="EURPLN">4.33320</b>
And i want display value of
id="EURPLN"
I want to set the config which site is to collect data and from which the ID
example:
defaults: { url: http://domain.com/ findID: "EURPLN" }
how to do it?
Note from admin: Please use Markdown on code snippets!
-
@strawberry-3-141 You are absolutly right, I made a bad example. The real html code is a little bit more complicated, so I mixed the real code with this example.
@ianperrin My input html is more complicated, but thanks for your answer, good to know!@Plati It helped me a lot to look at other modules. You should use a node_helper.js, there you can create a function which gets the html code of the website.
//function which gets the data from the given URL getTheData: function(theURLtoCatch) { var options = {url: theURLtoCatch}; request(options, (error, response, body) => { if (response.statusCode === 200) { this.sendSocketNotification("DATA", this.parseHTML(body)); } else { console.log("Error getting Data " + response.statusCode); this.sendSocketNotification("ERROR", response.statusCode); } }); }, parseHTML: function(dataBody) { //use something like ianperrin and strawberry showed in his example }