• Recent
  • Tags
  • Unsolved
  • Solved
  • MagicMirror² Repository
  • Documentation
  • 3rd-Party-Modules
  • Donate
  • Discord
  • Register
  • Login
MagicMirror Forum
  • Recent
  • Tags
  • Unsolved
  • Solved
  • MagicMirror² Repository
  • Documentation
  • 3rd-Party-Modules
  • Donate
  • Discord
  • Register
  • Login
A New Chapter for MagicMirror: The Community Takes the Lead
Read the statement by Michael Teeuw here.

Parse HTML String

Scheduled Pinned Locked Moved Development
7 Posts 5 Posters 5.2k Views 5 Watching
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    alihallo
    last edited by paviro Sep 11, 2016, 11:10 AM Aug 21, 2016, 11:54 AM

    Currently I try to create my first little module and could need a hint from an experienced person.
    In my node_helper.js I request the html code from an simple website by using:

    var options = {url: URL};
            request(options, (error, response, body) => {
                if (response.statusCode === 200) {
                    this.sendSocketNotification("DATA", this.parseData(body));
    

    So in the variable body there is something like

    <head>
    ...
    </head>
    <body><a name="top"></a>
       <div id="data_1">
          <p>The relevant data 1</p>
       </div>
       <div id="data_2">
          <p>The relevant data 2</p>
       </div>
    ...
    

    My question is: What would you recommend to get the relevant data from this body into an local variable?
    I first thought to use javascript & regex, but I guess that is not a good way to solve the issue, isn’t it?

    Best regards,
    alihallo

    1 Reply Last reply Reply Quote 0
    • C Offline
      cowboysdude Module Developer
      last edited by Aug 21, 2016, 11:54 PM

      I could be wrong but I believe you could request it using NPM request…

      1 Reply Last reply Reply Quote 0
      • A Offline
        alihallo
        last edited by Aug 29, 2016, 7:54 PM

        Thank you for your answer, but I coudn’t find out how to use the NPM request to parse the html code.
        But I found another solution to solve the issue:

        https://github.com/cheeriojs/cheerio

        This way I could get the data out of the html code like this:

        var options = {url: URL};
                request(options, (error, response, body) => {
                    if (response.statusCode === 200) {
                        this.sendSocketNotification("DATA", this.parseData(body));
        
        
        ...
        
        
        var $ = cheerio.load(body, {
           normalizeWhitespace: true,
           xmlMode: false
        });
        		
        $('div[class=data_1]').find('p').each(function (index, element) {
        	data_array.push($(element).text());
        });
        

        This way I could solve my problem.

        Best regards,
        alihallo

        I 1 Reply Last reply Aug 29, 2016, 9:26 PM Reply Quote 0
        • S Offline
          strawberry 3.141 Project Sponsor Module Developer
          last edited by strawberry 3.141 Aug 29, 2016, 8:19 PM Aug 29, 2016, 8:18 PM

          I think it’s weird that this works, because your looking for attribute class = data_1 but it’s an id

          the css selector for an id is #, and when you put p behind it will look for paragraphs in the element with the id data_1

          when you replace

          $('div[class=data_1]').find('p').each(function (index, element) {
          	data_array.push($(element).text());
          });
          

          with

          data_array.push($('#data_1 p').text());
          

          does it still work? Not sure if it will return the element if just one occurance is found or will return an array anyways

          Please create a github issue if you need help, so I can keep track

          1 Reply Last reply Reply Quote 1
          • I Offline
            ianperrin @alihallo
            last edited by ianperrin Aug 29, 2016, 9:35 PM Aug 29, 2016, 9:26 PM

            @alihallo

            If your input html file is fairly simple, you may be able to avoid the use of the cheerio library entirely

            // an array to hold the data from the file
            var data_array = [];
            // Get all p tag elements inside div tag elements with an id that starts with 'data_'
            var data_tags = body.querySelectorAll('div[id^=data_] p');
            // Loop through data tags and add content to data array
            for (i = 0; i < data_tags.length; i++) { 
                data_array.push(data_tags[i].innerHTML);
            }
            

            Of course the more complex your input file is the more you might benefit from the use of cheerio.

            "Live as if you were to die tomorrow. Learn as if you were to live forever." - Mahatma Gandhi

            1 Reply Last reply Reply Quote 0
            • P Offline
              Plati
              last edited by paviro Sep 11, 2016, 11:11 AM Sep 11, 2016, 9:33 AM

              I want to create a module that gets data from a website in div id.

              Example:

              Website code:

              <b class="b2 nieb" title="Kurs EUR na żywo" id="EURPLN">4.33320</b>
              

              And i want display value of id="EURPLN"

              I want to set the config which site is to collect data and from which the ID

              example:

              defaults: {
              		url: http://domain.com/
              		findID: "EURPLN"
              }
              

              how to do it?


              Note from admin: Please use Markdown on code snippets!

              1 Reply Last reply Reply Quote 0
              • A Offline
                alihallo
                last edited by Sep 11, 2016, 12:29 PM

                @strawberry-3-141 You are absolutly right, I made a bad example. The real html code is a little bit more complicated, so I mixed the real code with this example.
                @ianperrin My input html is more complicated, but thanks for your answer, good to know!

                @Plati It helped me a lot to look at other modules. You should use a node_helper.js, there you can create a function which gets the html code of the website.

                //function which gets the data from the given URL
                getTheData: function(theURLtoCatch) {
                   var options = {url: theURLtoCatch};
                   request(options, (error, response, body) => {
                      if (response.statusCode === 200) {
                         this.sendSocketNotification("DATA", this.parseHTML(body));
                      } else {
                         console.log("Error getting Data " + response.statusCode);
                         this.sendSocketNotification("ERROR", response.statusCode);
                      }
                   });
                },
                
                parseHTML: function(dataBody) {
                   //use something like ianperrin and strawberry showed in his example
                
                }
                
                1 Reply Last reply Reply Quote 0
                • 1 / 1
                • First post
                  Last post
                Enjoying MagicMirror? Please consider a donation!
                MagicMirror created by Michael Teeuw.
                Forum managed by Sam, technical setup by Karsten.
                This forum is using NodeBB as its core | Contributors
                Contact | Privacy Policy