Diamonds might be forever, but cron jobs are no slouches.
Upon adding a new cron task to a development machine I noticed that I had a cron job that had been running on the hour, every hour, since sometime in February. I had completely forgotten that it was there.
This cron job has been calling a ruby script that has been loyally scrapping the Google “Hot Search” page since February and faithfully tallying the results in a MySQL table.
I’m adding doing something useful with this data to the ever growing “list of things to do when there is time”. Maybe something JavaScripty like making a huge tag cloud out of them. Must… stop… starting… new projects….
Here is the mysqldump if anyone is curious – I know there are sites online that let you search through old Trends – but I don’t recall seeing dumps of them.
- trends_db09212010.sql.gz (640 KB)
The schema is indicative of the quick hack that the script was. The text of each trend is in “trends” and if it appeared more than once, each occurrence gets its own tally in “timestamps” along with the timestamp of the occurrence. “Count” is incremented in “trends” to show the total number of occurrences.
mysql> describe timestamps; +------------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+----------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | trend_id | int(11) | NO | | NULL | | | created_at | datetime | YES | | NULL | | +------------+----------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> describe trends; +-------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(256) | YES | | NULL | | | count | int(11) | NO | | 0 | | +-------+------------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec)
Its somewhat depressing looking through them all – most are whatever celebrity gossip was all the rage that day:
mysql> select * from trends where name LIKE '%tiger%'; +-------+-------------------------------------------+-------+ | id | name | count | +-------+-------------------------------------------+-------+ | 776 | joslyn james tiger woods | 1 | | 983 | how old is tiger woods | 1 | | 991 | tiger grant awards | 4 | | 1002 | tiger woods press conference | 10 | | 1202 | what time is tiger woods press conference | 2 | ......... | 5223 | tiger woods divorce | 23 | | 8986 | tiger tiger burning bright poem | 8 | | 12882 | tiger woods divorce settlement | 10 | | 14269 | goliath tigerfish | 3 | | 18877 | rudy ruettiger | 8 | +-------+-------------------------------------------+-------+ 38 rows in set (0.00 sec)
As a PS – I don’t think this is against Google’s ToS or anything – it goes out in a public feed as well as being on that page. If I’m wrong I’ll take it down so please don’t smite me Mr. Google.