High(er) Signal to Noise HN

@highsnhn on Twitter

What: A Twitter bot that tweets HN submissions that are over 100 points and have a high point to comment ratio.

Why: Passively monitor the most important items that come across HN while attempting to weed out some of the more pop-culture-y or flamebait-y submissions.

Update 02/25/2012: To keep up with vote inflation the minimum score has been increased to 125 upvotes.

Code is on Github here: https://github.com/mschwar99/high_sn_hn

Details:

I find a lot of value in HN both as a way to keep up to date in my field and as a source of inspiration.  However, browsing through HN too often is a time sink that I try and avoid.

On the occasions that I do wind up reading through HN I have noticed that the lower the ratio of an item’s upvotes to comment count the less likely I was to get value out of either the story or the discussion.  Maybe this was evidence of a flamebait-y article.  Maybe a serious bikeshedding conversation was going on in the comments. Whatever it was I noticed it often enough that I thought I should test to see if there was a metric there or if I was just experiencing some bias.

I set up a script to scrape the front page (politely – it downloads the front page, and only the front page, once an hour) and started keeping track of front page submissions, scores, and comments.

I’m not a huge Twitter-er, but get a lot of value from using it as a news aggregator.  @newsyc100 is an account that tweets HN stories with 100 points and one of the first feeds that I followed.

@highsnhn is a combination of those two ideas – submissions with > 100 points and a score to comment ratio of more than 2.  I am finding it useful and thought I would share.

Results after a week

100 point submissions = 129

100 points and a S/N > 2.0 = 86
filtered = 43

A reduction of a third is pretty significant and makes “keeping up” more manageable.

Here are the 100 point submissions that got filtered:

mysql> select title from hn_stories where score > 100 and ( (score / comments) < 2);
+---------------------------------------------------------------------------------+
| title                                                                           |
+---------------------------------------------------------------------------------+
| This Guy Has My MacBook                                                         |
| The Revolutionary Birth Control Method for Men                                  |
| Apple to unveil iCloud Monday, June 6                                           |
| Resources are being utterly and completely wasted on mining bitcoins            |
| We need an AirBNB for Mentorship--not $35k a year wasted on college             |
| How much GNU is there in GNU/Linux?                                             |
| GitHub: Block the Bullies                                                       |
| Ask HN: Who is Hiring? (June 2011)                                              |
| Battleships: a ridiculous but awesome idea                                      |
| Blunt and necessary review of programming language books.                       |
| How to Lose $81,000 in Bitcoins                                                 |
| Intrade CEO dies climbing Mt. Everest                                           |
| Google +1 button for websites                                                   |
| Previewing Windows 8 (Video)                                                    |
| Airbnb admits rogue sales team used Craigslist for stealthy property drive      |
| Why Windows 8 Is Fundamentally Flawed as a Response to the iPad                 |
| The Quora post that killed Bitcoins. Please discuss if his arguments are valid. |
| EFF no longer accepts donations in Bitcoins                                     |
| Groupon files for IPO                                                           |
| SonyPictures.com hacked, personal information and passwords compromised         |
| Google paper comparing performance of C++, Java, Scala, and Go [PDF]            |
| Microsoft refuses to comment as .NET developers fret about Windows 8            |
| Groupon is Effectively Insolvent                                                |
| How a 3 week business trip to the US got reduced to 3 hours                     |
| Suspension, Ban or Hellban?                                                     |
| I moved to Singapore                                                            |
| Ask HN: My Startup is Going to Die Because I Messed Up                          |
| Piracy: are we being conned?                                                    |
| LulzSec EXPOSED                                                                 |
| Mac OS X Lion: Coming In July For $29                                           |
| Apple's iCloud will automatically store, sync data for free                     |
| Competing with Apple - or "Never mess with your Landlord"                       |
| Demoted                                                                         |
| iOS 5 has garbage collection. Here comes MacRuby/iOS?                           |
| Financial Times Won't Give Apple A Cut, Drops iOS for Web App                   |
| Anonymous message to NATO                                                       |
| Is there a new geek anti-intellectualism?                                       |
| Steve Jobs Presents His Ideas For A New Apple Campus                            |
| Third richest man in China lives on $20 a day, eats same meals as workers       |
| The Dangerous Mr. Khan (and the Khan Academy)                                   |
| Failed entrepreneur, broke, unemployed, now taking care of aging parents. Help. |
| My life in Accenture before startups                                            |
| Apple Reverses Course On In-App Subscriptions                                   |
+---------------------------------------------------------------------------------+
43 rows in set (0.00 sec)

 

Stories I am personally glad to see in that filtered set:

  • The bitcoin stories – no offence to people working on it or excited about it, but I find a lot of the dialogue to be relatively noisy.
  • Lulzsec and Sony stories – As above these are legit news to an extent, but I find the more recent pile on stories to be of lower value.

 

A result I was happy to see:

Filtered stories where title LIKE ‘%apple%’ :

+---------------------------------------------------------------+
| title                                                         |
+---------------------------------------------------------------+
| Apple to unveil iCloud Monday, June 6                         |
| Apple's iCloud will automatically store, sync data for free   |
| Competing with Apple - or "Never mess with your Landlord"     |
| Financial Times Won't Give Apple A Cut, Drops iOS for Web App |
| Steve Jobs Presents His Ideas For A New Apple Campus          |
| Apple Reverses Course On In-App Subscriptions                 |
+---------------------------------------------------------------+
6 rows in set (0.00 sec)

100 point Apple stories that made it through:

+-------------------------------------------+
| title                                     |
+-------------------------------------------+
| Apple is professional, the web is amateur |
| Apple iCloud                              |
| Apple copies rejected app                 |
+-------------------------------------------+
3 rows in set (0.00 sec)

 

Apple is definitely a news maker and worth keeping up with, but lots of Apple stories are packed with noise.  The official iCloud story from apple.com made it through and the other stories were worthwhile if not earth shattering.

I was thinking of adding in better filters, such as granting leeway to an Apple story from apple.com or a Google story from a Google domain.  We will see if that is necessary.

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.