Words appearing most frequently in the corpus are not the most important words as might be expected. They are common words like stop-words. Rare words are actually the best indicators of importance, especially if they appear multiple times.
tf-idf can be represented by the following equation (thanks Online LaTeX Equation Editor):
Term frequency is calculated as the frequency of word i in document j is divided (normalized) by the maximum frequency of any word k in document j. Inverse document frequency, which accounts for words that are just more common, is calculated as the number of documents N divided by the number of those documents n the word i appears in, and then scaling that by taking the logarithm (the base of the log function does not matter).
This topic has come up in a couple of Coursera classes I have looked at--Web Intelligence and Big Data and Mining Massive Datasets--in the context of a search engine. Basically, you view each document and query (short document) as a vector of tf-idf scores, then you can find the most similar ones using cosine similarity as a way to rank the search results. Inverted indexes allow us to pre-compute much of the tf-idf score.
UPDATE:
scikit-learn has an tf-idf usage example at Clustering text documents using k-means.