I have been in DevOps related jobs for past 6 years dealing mainly with Kubernetes in AWS and on-premise as well. I spent quite a lot …
:date_long | 1 min Read
How to process raw html page via pup and jq to get ratings
The friend of mine wrote Bash script that parses raw HTML page using grep and loops to find images with rating higher than some number.
Here is in my opinion a simplified version of that script using jq
and pup
.
It turnes out that pup
binary is preinstalled at my Mac and I use jq
pretty much everyday.
pup
binary converts raw HTML to json
format that can be later on relatively easy used to parse “likes” and respective URLs.
#!/bin/bash
#!/bin/bash
export RATING=30
wget https://www.rouming.cz -O - | \
pup 'div.wrapper json{}' | \
jq -r '.[] | .children[0].children[0].children[0].children
| .[] | {
likes: .children[3].children[0].text|tonumber,
dislikes: .children[5].children[0].text|tonumber,
url: .children[6].children[0].href,
rating: ((.children[3].children[0].text|tonumber) - (.children[5].children[0].text|tonumber))|tonumber
}
| select(.rating >= '$RATING')' \
| jq -sr '. |=sort_by(.rating) | .[] | .url'
Links:
202407181207