Read the statement by Michael Teeuw here.
MMM-Scrapey - I made something that could be useful?
-
I don’t think there’s any other module out there which will let you scrape content from a webpage for display on MagicMirror, maybe there’s a good reason for that, but I went ahead and built one for myself, with quite a bit of help from ChatGPT (I wouldn’t consider myself a developer, just a tinkerer, so please excuse the code quality!).
I’ve made it quite flexible, so whilst my use case is to scrape live bus departure times from the stop outside my house, I’m sure there will be other use cases …
You specify a URL to scrape, a CSS selector to identify the table on the page, and then some options about what rows and columns you want to include.
I’m not sure if this is ready for public consumption, but if any devs would like to take a look and give me any advice, please do.
-
@AndyHazz Cool! Don’t forget to add it to the module list: https://github.com/MagicMirrorOrg/MagicMirror/wiki/3rd-party-modules 🙂
-
@KristjanESPERANTO No problem, done!
-
@AndyHazz Thanks for this awesome module. I’m also using it to pull bus information and it’s working well. I wondered if there was a way of stripping formatting off the content it scrapes? The table I’m pulling from (https://bustimes.org/stops/250012526) includes hyperlinks in several of the columns, and I’d like to format them as plain text.
Any ideas how that could be achieved?
Thanks again for this, it’s totally made things amazing :-) -
I think I figured it out by adjusting the .js code to return .text rather than .innerHTML on line 88.
This could possibly be something that could be a module option. If I get some time I might consider taking this and doing some work on it, but like you I’m a bit of a ChatGPT coder :-), so I’m not even sure how to copy it!It’s working great for me now though, so thank you again!
-
@chris_d glad you’ve found it useful, and that’s a good idea for the module option - I actually had the same issue with the data I was scraping and was using some custom CSS to clean up the formatting, but a plain text option in the module is a better approach.
So, I’ve added a new ‘plainText’ option to the config. The default is still to use HTML in case anyone was making use of that (I doubt it but you never know), you’ll need to add ‘plainText: true’ to the module config to strip the HTML formatting.
-
Hello.
Thanks for a great module.
I wonder how to decrese the distance between the lines.
I’ve tried adding table attributes to the table (as I can controll the table myself), but it doesn’t seems to work.
I also tried with custom.css to add the attributes style in it.
However, it appears that the module strips all styling from the presentation.
This is the code that I have used in custom.css, but these don’t seem to bite.
.MMM-Scrapey{ height:2px; } .MMM-Scrapey .td { padding: 2px; } .MMM-Scrapey .table { border-spacing: 2px; } .table { border-spacing: 2px; } .td { padding: 2px; }
-
@fredric.nil don’t know the answer, but use the developers window elements tab
see second link in my signature below to learn how to use
-
Thanks for the tip, but that’s where I was looking.
It seems that no matter what I put in for attributes, they don’t show up in the module.
I’m going crazy soon -
@fredric.nil where are you putting them?