How to process raw html page via pup and jq to get ratings
How to process raw html page via pup and jq to get ratings
How to process raw html page via pup and jq to get ratings
The friend of mine wrote Bash script that parses raw HTML page using grep and loops to find images with rating higher than some number.
Here is in my opinion a simplified version of that script using jq and pup.
It turnes out that pup binary is preinstalled at my Mac and I use jq pretty much everyday. pup binary converts raw HTML to json format that can be later on relatively easy used to parse “likes” and respective URLs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
export RATING=30
wget https://www.rouming.cz -O - | \
pup 'div.wrapper json{}' | \
jq -r '.[] | .children[0].children[0].children[0].children
| .[] | {
likes: .children[3].children[0].text|tonumber,
dislikes: .children[5].children[0].text|tonumber,
url: .children[6].children[0].href,
rating: ((.children[3].children[0].text|tonumber) - (.children[5].children[0].text|tonumber))|tonumber
}
| select(.rating >= '$RATING')' \
| jq -sr '. |=sort_by(.rating) | .[] | .url'
Links:
202407181207
This post is licensed under CC BY 4.0 by the author.
