Article Popularity Trends on Hacker News

We extracted n-gram (e.g. “Harry Potter”, “Google”, “Silicon Valley”) and skip features (e.g., “a ____ for”, ”| ____ acquires”) for each title, including start- and end-of-sentence markers and optionally punctuation. For learning we used boosted stochastic gradient descent with logistic loss [2], predicting whether the article made it to the top 20 or not during its observed lifetime. Strong regularization was used to eliminate spurious features, and twenty bootstrap replicates were used to measure significance of coefficients and classification accuracy.