Read the statement by Michael Teeuw here.
Parse HTML String
-
Currently I try to create my first little module and could need a hint from an experienced person.
In mynode_helper.jsI request the html code from an simple website by using:var options = {url: URL}; request(options, (error, response, body) => { if (response.statusCode === 200) { this.sendSocketNotification("DATA", this.parseData(body));So in the variable body there is something like
<head> ... </head> <body><a name="top"></a> <div id="data_1"> <p>The relevant data 1</p> </div> <div id="data_2"> <p>The relevant data 2</p> </div> ...My question is: What would you recommend to get the relevant data from this body into an local variable?
I first thought to use javascript & regex, but I guess that is not a good way to solve the issue, isn’t it?Best regards,
alihallo -
I could be wrong but I believe you could request it using NPM request…
-
Thank you for your answer, but I coudn’t find out how to use the NPM request to parse the html code.
But I found another solution to solve the issue:https://github.com/cheeriojs/cheerio
This way I could get the data out of the html code like this:
var options = {url: URL}; request(options, (error, response, body) => { if (response.statusCode === 200) { this.sendSocketNotification("DATA", this.parseData(body)); ... var $ = cheerio.load(body, { normalizeWhitespace: true, xmlMode: false }); $('div[class=data_1]').find('p').each(function (index, element) { data_array.push($(element).text()); });This way I could solve my problem.
Best regards,
alihallo -
I think it’s weird that this works, because your looking for attribute class = data_1 but it’s an id
the css selector for an id is
#, and when you putpbehind it will look for paragraphs in the element with the id data_1when you replace
$('div[class=data_1]').find('p').each(function (index, element) { data_array.push($(element).text()); });with
data_array.push($('#data_1 p').text());does it still work? Not sure if it will return the element if just one occurance is found or will return an array anyways
-
If your input html file is fairly simple, you may be able to avoid the use of the cheerio library entirely
// an array to hold the data from the file var data_array = []; // Get all p tag elements inside div tag elements with an id that starts with 'data_' var data_tags = body.querySelectorAll('div[id^=data_] p'); // Loop through data tags and add content to data array for (i = 0; i < data_tags.length; i++) { data_array.push(data_tags[i].innerHTML); }Of course the more complex your input file is the more you might benefit from the use of cheerio.
-
I want to create a module that gets data from a website in div id.
Example:
Website code:
<b class="b2 nieb" title="Kurs EUR na żywo" id="EURPLN">4.33320</b>And i want display value of
id="EURPLN"I want to set the config which site is to collect data and from which the ID
example:
defaults: { url: http://domain.com/ findID: "EURPLN" }how to do it?
Note from admin: Please use Markdown on code snippets!
-
@strawberry-3-141 You are absolutly right, I made a bad example. The real html code is a little bit more complicated, so I mixed the real code with this example.
@ianperrin My input html is more complicated, but thanks for your answer, good to know!@Plati It helped me a lot to look at other modules. You should use a node_helper.js, there you can create a function which gets the html code of the website.
//function which gets the data from the given URL getTheData: function(theURLtoCatch) { var options = {url: theURLtoCatch}; request(options, (error, response, body) => { if (response.statusCode === 200) { this.sendSocketNotification("DATA", this.parseHTML(body)); } else { console.log("Error getting Data " + response.statusCode); this.sendSocketNotification("ERROR", response.statusCode); } }); }, parseHTML: function(dataBody) { //use something like ianperrin and strawberry showed in his example }
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login