How to count numbers from pdf
It is possible to extract pdf text and get some valuable information out of it.
It is possible to extract PDF text and get some valuable information out of it.
This pipeline extracts USD amounts from multiple PDF files, sums them up, and prints the total. It uses pdftotext to convert PDFs to text, grep to find lines matching a USD pattern, sed to normalize the decimal separator, and awk to compute the sum. You need pdftotext (from the poppler package) installed.
1
2
3
4
5
6
7
for i in $(ls *.pdf); do \
pdftotext $i - | grep -E '^\+.*(USD)$'; done \
| grep -Eo '[0-9]+,[0-9]+' --color \
| sed 's/,/./g' \
| awk '{s+=$1}END{print s}'
This post is licensed under CC BY 4.0 by the author.