T.J. Trimble

Natural Language Processing
& Machine Learning Engineer

trimblet at me dot com


FirstRain, Inc.

Natural Language Processing Engineer

Information Extraction

Owned and improved an Information Extraction engine processing thousands of documents each week, using Stanford's TokensRegex system.

  • Wrote new rule-based logic for named entity linking, sentence selection, and duplicate detection.
  • Cut the pre-existing processing time by 75% by improving algorithmic efficiency and optimizing REST requests.
  • Improved type safety and organization of the code.
  • Maintained production code, fixing precision and recall bugs as reported, working directly with dev-ops, the front end team, and customers to debug, fix, and push development and deployment enhancements to production.

Topic-Entity Linking

Designed, developed, and deployed a system matching named entities with topics.

  • Utilized the Stanford Dependency Parser to extract robust semantic relationships between topic substrings and company substrings efficiently.
  • Developed content analytics to extract relationship scores between entities over millions of sentences, such as textual proximity and rule based filtering.
  • Extracted textual references to companies and organizations in documents utilizing linguistics and world knowledge.
  • Optimized code within an Apache Storm-like real time document processing engine, analyzing ~100,000 documents per day.

Other Projects

Adjectives in the LinGO Grammar Matrix

Master's Thesis & Project

Supervisor: Emily Bender

I extended the Grammar Matrix, an open source grammar engineering project, to enable the morphological, syntactic, and semantic analysis of adjectives cross-linguistically.

  • I developed a core HPSG linguistic analyses of adjectives accounting for data from dozens of languages.
  • I extended and added new features to an online grammar customization system using Python and JavaScript for starter natural language HPSG-style grammars.
  • I extended a Python server-side grammar customization library to produce machine readable grammatical description of adjectival lexemes, inflection, and agreement.
  • I developed a test suite of both constructed languages and natural languages to test system development and provide regression tests.

Question Answering with Off-the-shelf Deep Processing Systems

with Woodley Packard & Melanie Bolla

We designed a TRAC-style Question Answering system, utilizing open source deep processing tools such as the Stanford CoreNLP dcoref coreference resolver, WordNet, DELPH-IN syntax/semantics processing, and NLTK.

  • I designed and implemented a distributed coreference resolver system for TRAC-style questions using Stanford CoreNLP dcoref and HTCondor.
  • Our system was best in our class after 9 weeks of development, getting a TRAC strict score of 21.76 and a lenient score of 33.89 on unseen data.

Towards Augmenting Coreference Resolution with a Broad Coverage Precision Grammar

with Ryan Aldrich

We utilized the large English Resource Grammar open source HPSG grammar to extend the Stanford CoreNLP dcoref Coreference Resolution system using semantic representations to augment existing functionality.

  • I designed several new rules for coreference resolution utilizing semantic representations in Minimal Recursion Semantics;
  • Our system increased the best-in-class dcoref CoNLL recall score by 1.63 on unseen data after 8 weeks of development;

Training Joint Models to Discover Topic Sentiment in Review Mining

with Yi-Shu Wei

We designed and implemented an end-to-end Sentiment Analyzer using Machine Learning Classifiers in MALLET. We implemented a feature selection algorithm using Latent Dirichlet Allocation to divide the data by topic in an attempt to improve training. We showed that LDA topic modeling did not improve classifier performance.

  • I implemented an end-to-end sentiment analyzer using shallow features and MALLET Machine Learning classifiers.
  • My system achieved 69.7% accuracy on unseen data with a three-way classification task.


Professional Master of Science, Computational Linguistics


Coursework & Projects in Natural Language Processing, Machine Learning, Statistics, Systems Engineering, and Linguistics.

Bachelor of Arts, Linguistics


Coursework in Syntax, Semantics, Morphology, Phonology, Phonetics, Psycholinguistics, and Neurolinguistics.

Key Coursework

Advanced Statistical Methods in Natural Language Processing

I worked in a team to implement and test several Machine Learning algorithms and techniques, including Decision trees, KNN, Naive Bayes, and Support Vector Machines. I also developed systems for improving Machine Learning, such as feature selection algorithms (chi-squared) and boosting methods (such as Transformation Based Learning).

Deep Processing Techniques for NLP

I worked in a team to implement several deep processing methods, such as parsing, word sense disambiguation, and coreference resolution, using techniques such as CKY, (P)CFGs, and Hobb's algorithm.

Linguistics Expressions of Sentiment, Subjectivity, and Stance

This course included several presentations and discussion of cutting-edge sentiment analysis research, such as review mining, aspect extraction, recognizing spam, and summarization. I developed an end-to-end sentiment analysis system using MALLET (see above).

MRS in Applications

This course consisted of presentation and discussion of several NLP applications, including sentiment analysis, summarization, and coreference resolution, and how to apply deep processing techniques, especially with respect to graph-based sentential semantic models (Minimal Recursion Semantics), to improve existing cutting-edge systems.

© T.J. Trimble