Publications
General audience
- "I'm sorry Dave, I'm afraid I can't do that": Linguistics, statistics, and natural language processing circa 2001.
Lillian Lee.
Computer Science: Reflections on the Field, Reflections from the Field, National Academies Press, pp. 111–118, 2004. - A matter of opinion: Sentiment analysis and business intelligence (position paper).
Lillian Lee.
Written for the IBM Faculty Summit on the Architecture of On-Demand Business, 2004.
Sentiment analysis
- Opinion mining and sentiment analysis.
Bo Pang and Lillian Lee.
Foundations and Trends in Information Retrieval 2(1-2), pp. 1&ndash135, July 2008.
A monograph surveying the field (note the singular). - The power of negative thinking: Exploiting label disagreement in the min-cut classification framework.
Mohit Bansal, Claire Cardie, and Lillian Lee.
Proceedings of COLING, 2008. Poster paper. - Using very simple statistics for review search: An exploration.
Bo Pang and Lillian Lee.
Proceedings of COLING, 2008. Poster paper. - Get out the vote: Determining support or opposition from Congressional floor-debate transcripts.
Matt Thomas, Bo Pang, and Lillian Lee.
Proceedings of EMNLP, pp. 327–335, 2006. - Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.
Bo Pang and Lillian Lee.
Proceedings of the ACL, pp. 115–124, 2005. - A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts.
Bo Pang and Lillian Lee.
Proceedings of the 42nd ACL, pp. 271–278, 2004. - Thumbs up? Sentiment classification using machine learning techniques.
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
Proceedings of EMNLP 2002, pp. 79–86.
Information retrieval
- IDF revisited: A simple new derivation within the Robertson-Spärck Jones probabilistic model.
Lillian Lee.
Procedings of SIGIR, pp. 751–752, 2007. Poster paper. - Respect my authority! HITS without hyperlinks, utilizing cluster-based language models.
Oren Kurland and Lillian Lee.
Proceedings of SIGIR, pp. 83–90, 2006. - PageRank without hyperlinks: Structural re-ranking using links induced by language models.
Oren Kurland and Lillian Lee.
Proceedings of SIGIR, pp. 306–313, 2005. - Better than the real thing? Iterative pseudo-query processing using cluster-based language models.
Oren Kurland, Lillian Lee, and Carmel Domshlak.
Proceedings of SIGIR, pp. 19–26, 2005. - Corpus structure, language models, and ad hoc information retrieval.
Oren Kurland and Lillian Lee.
Proceedings of SIGIR, pp. 194–201, 2004. - Iterative residual rescaling: An analysis and generalization of LSI.
Rie Kubota Ando and Lillian Lee.
Proceedings of the 24th SIGIR, pp. 154–162, 2001.
Generation
- Catching the drift: Probabilistic content models, with applications to generation and summarization.
Regina Barzilay and Lillian Lee.
Proceedings of HLT-NAACL, pp. 113–120, 2004. - Learning to paraphrase: An unsupervised approach using multiple-sequence alignment.
Regina Barzilay and Lillian Lee.
Proceedings of HLT/NAACL 2003, pp. 16–23. - Bootstrapping lexical choice via multiple-sequence alignment.
Regina Barzilay and Lillian Lee.
Proceedings of EMNLP, pp. 164–171, 2002.
Distributional similarity
- On the effectiveness of the skew divergence for statistical language analysis.
Lillian Lee.
Artificial Intelligence and Statistics 2001, pp 65–72, 2001. - Measures of Distributional Similarity.
Lillian Lee.
Proceedings of the 37th ACL, pp 25–32, 1999. - Distributional similarity models: Clustering vs. nearest neighbors.
Lillian Lee and Fernando Pereira.
Proceedings of the 37th ACL, pp 33–40, 1999. - Similarity-based models of word cooccurrence probabilities (pre-publication version).
Ido Dagan, Lillian Lee, and Fernando Pereira.
Machine Learning 34(1-3), special issue on natural language learning, pp 43–69, 1999. - Similarity-Based Methods for Word Sense Disambiguation.
Ido Dagan, Lillian Lee, and Fernando Pereira.
Proceedings of the 35th ACL/8th EACL, pp 56–63, 1997. - Similarity-Based Approaches to Natural Language Processing.
Lillian Lee.
Ph.D. thesis.
Harvard University Technical Report TR-11-97. - Similarity-Based Estimation of Word Cooccurrence Probabilities.
Ido Dagan, Fernando Pereira, and Lillian Lee.
Proceedings of the 32nd ACL, pp 272–278, 1994. - Distributional clustering of English words.
Fernando Pereira, Naftali Tishby, and Lillian Lee.
Proceedings of the 31st ACL, pp 183–190, 1993.
Of related interest: Baker and McCallum's SIGIR '98 paper, Distributional Clustering of Words for Text Classification, favorably compares [PTL 93] to LSI and other algorithms. Segmentation
- Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences (pre-publication version).
Rie Kubota Ando and Lillian Lee.
Natural Language Engineering 9(2), pp. 127–149, 2003. - Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji.
Rie Kubota Ando and Lillian Lee.
First Conference of the NAACL, pp. 241–248, 2000. - Unsupervised Statistical Segmentation of Japanese Kanji Strings.
Rie Ando and Lillian Lee.
Cornell University CS Technical Report TR99-1756, 1999.
Context-free languages
- Fast Context-Free Grammar Parsing Requires Fast Boolean Matrix Multiplication.
Lillian Lee.
Journal of the ACM 49(1), pp. 1-15, 2002.
Conference version appeared in the Proceedings of the 35th ACL/8th EACL, pp 9–15, 1997. - Learning of Context-Free Languages: A Survey of the Literature.
Lillian Lee.
Harvard University Technical Report TR-12-96 (written in 1994).
Reviews and pedagogy
- A new start: Innovative introductory AI-centered courses at Cornell.
Eric Breck, David Easley, K-Y Daisy Fan, Jon Kleinberg, Lillian Lee, Jennifer Wofford, and Ramin Zabih.
AAAI Spring Symposium on Using AI to Motivate Greater Participation in Computer Science, AAAI Technical Report SS-08-08, pp. 8-13, 2008. - A non-programming introduction to computer science via NLP, IR, and AI.
Lillian Lee.
ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp 32–37, 2002. - Foundations of Statistical Natural Language Processing by Christopher D. Manning and Hinrich Schütze [review] (prepublication version).
Lillian Lee.
Computational Linguistics 26(2), pp 277–279, 2000.