Parse HTML String



  • Currently I try to create my first little module and could need a hint from an experienced person.
    In my node_helper.js I request the html code from an simple website by using:

    var options = {url: URL};
            request(options, (error, response, body) => {
                if (response.statusCode === 200) {
                    this.sendSocketNotification("DATA", this.parseData(body));
    

    So in the variable body there is something like

    
    ...
    
    <a name="top"></a>
       <div>
          <p>The relevant data 1</p>
       </div>
       <div>
          <p>The relevant data 2</p>
       </div>
    ...
    

    My question is: What would you recommend to get the relevant data from this body into an local variable?
    I first thought to use javascript & regex, but I guess that is not a good way to solve the issue, isn’t it?

    Best regards,
    alihallo



  • I could be wrong but I believe you could request it using NPM request…



  • Thank you for your answer, but I coudn’t find out how to use the NPM request to parse the html code.
    But I found another solution to solve the issue:

    https://github.com/cheeriojs/cheerio

    This way I could get the data out of the html code like this:

    var options = {url: URL};
            request(options, (error, response, body) => {
                if (response.statusCode === 200) {
                    this.sendSocketNotification("DATA", this.parseData(body));
    
    
    ...
    
    
    var $ = cheerio.load(body, {
       normalizeWhitespace: true,
       xmlMode: false
    });
    		
    $('div[class=data_1]').find('p').each(function (index, element) {
    	data_array.push($(element).text());
    });
    

    This way I could solve my problem.

    Best regards,
    alihallo



  • I think it’s weird that this works, because your looking for attribute class = data_1 but it’s an id

    the css selector for an id is #, and when you put p behind it will look for paragraphs in the element with the id data_1

    when you replace

    $('div[class=data_1]').find('p').each(function (index, element) {
    	data_array.push($(element).text());
    });
    

    with

    data_array.push($('#data_1 p').text());
    

    does it still work? Not sure if it will return the element if just one occurance is found or will return an array anyways



  • @alihallo

    If your input html file is fairly simple, you may be able to avoid the use of the cheerio library entirely

    // an array to hold the data from the file
    var data_array = [];
    // Get all p tag elements inside div tag elements with an id that starts with 'data_'
    var data_tags = body.querySelectorAll('div[id^=data_] p');
    // Loop through data tags and add content to data array
    for (i = 0; i < data_tags.length; i++) { 
        data_array.push(data_tags[i].innerHTML);
    }
    

    Of course the more complex your input file is the more you might benefit from the use of cheerio.



  • I want to create a module that gets data from a website in div id.

    Example:

    Website code:

    <b>4.33320</b>
    

    And i want display value of id="EURPLN"

    I want to set the config which site is to collect data and from which the ID

    example:

    defaults: {
    		url: http://domain.com/
    		findID: "EURPLN"
    }
    

    how to do it?


    Note from admin: Please use Markdown on code snippets!



  • @strawberry-3-141 You are absolutly right, I made a bad example. The real html code is a little bit more complicated, so I mixed the real code with this example.
    @ianperrin My input html is more complicated, but thanks for your answer, good to know!

    @Plati It helped me a lot to look at other modules. You should use a node_helper.js, there you can create a function which gets the html code of the website.

    //function which gets the data from the given URL
    getTheData: function(theURLtoCatch) {
       var options = {url: theURLtoCatch};
       request(options, (error, response, body) => {
          if (response.statusCode === 200) {
             this.sendSocketNotification("DATA", this.parseHTML(body));
          } else {
             console.log("Error getting Data " + response.statusCode);
             this.sendSocketNotification("ERROR", response.statusCode);
          }
       });
    },
    
    parseHTML: function(dataBody) {
       //use something like ianperrin and strawberry showed in his example
    
    }
    

Log in to reply
 

Looks like your connection to MagicMirror Forum was lost, please wait while we try to reconnect.