NYT feed contains some reocurring garbage
aaronaxvig last edited by
When using the default NYT feed there are some reoccurring useless headlines. I have some ideas for filtering the list before cycling through for display, in order from most straightforward to craziest.
- Remove titles equal to “Your Daily Mini Crossword”
- Remove titles equal to “California Today”
- Remove titles that start with “Here Are” (example that I don’t see value in is “Here Are Our Stories That Won the Biggest Awards in the Magazine World”)
- Don’t show titles that don’t contain a verb. An alternate solution may be to include the description as subtext of the title as that often does contain something useful. (examples that I don’t see value in: “Guardians of a Vast Lake, and a Refuge for Humanity”, “Before the Wall: A Borderlands Journey”, “Feature: The Preacher and the Sheriff”) Could use something like this as a parser. May have to be provided as a service as apparently it is 261MB zipped and requires 100-500MB of RAM.
I am interested in working on this feature myself but thought I would float it for discussion first.
@aaronaxvig there is already a solution for it chckout option