Transforming NLP for Defense Personnel Analytics: ADVANA Cloud-Based Platform
Defense Personnel Analytics Center (DPAC) is enhancing their NLP capabilities by implementing a transformer-based platform on the Department of Defense's cloud system ADVANA. The platform focuses on topic modeling and sentiment analysis of open-ended survey responses from various DoD populations. Leveraging state-of-the-art models like BERT, DPAC aims to improve text embeddings and thematic clustering, moving beyond traditional methods like LDA. ADVANA, with its advanced analytical tools, enables DPAC to conduct sophisticated data analysis and machine learning operations for valuable insights to support decision-making within the DoD.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
More than Meets the Eye A Transformer Based NLP Platform for Analyzing Open-Ended Survey Responses
WHO ARE WE We are the Defense Personnel Analytics Center (DPAC), formerly known as the Office of People Analytics (OPA), a personnel research organization under the Defense Human Resources Activity. DPAC employs a wide variety of personnel researchers from the fields of psychology, sociology, survey and mathematical statisticians, data scientists and more. We develop and administer a diverse portfolio of surveys to analyze many DoD populations among active and reserve duty service members, service member spouses, recruiters, and civilians in order to provide rich personnel data and analytics to our DoD policy office stakeholders. 2
OVERVIEW Objective: Create a Natural Language Processing platform on DoD cloud-based system ADVANA Areas of NLP Addressed: Topic modeling (What are service members discussing) Sentiment Analysis (How are they discussing these topics) Approach: Expand DPAC s topic modeling capabilities utilizing state-of-the-art transformer sentence embedding models Train and test sentiment models on DPAC s domain specific data using fine-tuned BERT models 3
ADVANA What is ADVANA: Derived from Advancing Analytics The DoD s cloud-based business analytics platform that hosts a wide variety of analytical tools from SQL querying to advanced machine learning capabilities using OSS. ADVANA is centered around both cloud-based data storage and computational resources. The tools used in the presentation below include: DataBricks Runtime 10.4 LDS ML Notebook based coding (Databricks) MLFlow for Machine Learning Operations (MLOps) Python 3.9 (top shelf libraries listed below) Transformers Pytorch Sentence-transformers BERTopic SciKit Learn Etc. 4
TOPIC MODELING DPAC has previously leveraged traditional topic modeling using LDA to develop thematic clusters in our survey data. Limitations of LDA have included: Non-standardized preprocessing Lower coherence scores1 using NPMI2 With the advent of sentence transformers (an extension of BERT model framework), vast improvements in text embeddings have been made over GloVe3. BERTopic library (created by Marteen Grootendorst) leverages sentence transformers and can take a wide variety of pretrained embedding models. We used all-MiniLM-L6-v2 for testing on results shown here and all-mpnet-base-v2for production topic modeling, both from the Huggingface repository. img source: BERTopic 5 1Grootendorst, 2022 2normalized pointwise mutual information 3Reimers and Gurevych, 2019
TOPIC MODELING Two of the most useful variations of BERTopic are Hierarchical Topic Modeling and Dynamic Topic modeling. Part of traditional topic modeling has been selecting the correct number of topics, this is usually done using one a many metrics from information retrieval theory (NPMI, holdout log-likelihood, etc.). With BERTopic we can set a level for the number of topic to be quite high and fit them in a hierarchical fashion to see how these granular topics best relate to one another. Once a hierarchical model has been fit, we can go back and merge topics that are duplicative in nature, thus allowing us the ability to fit the number of topics heuristically. This is possible using the BERTopic API: 6
TOPIC MODELING Dynamic topic modeling is done by fitting a global model then breaking the documents according to timestamp and refitting the topic representations. The diagram to the right shows this process which also comes in two flavors global tuning and evolution tuning ; we used global in the image below. img source: BERTopic Dynamic topic modeling allows us the ability to see both how topics evolve over time and also how they trend temporally. 7
SENTIMENT MODELING To develop a sentiment model on our domain specific data we hand labeled 30,000 comments from across 8 different DPAC surveys using a 5 level Likert scale. We found that the 5-level scale was not informative as most of our comments were either neutral or negative. To address these we collapsed our sentiment scale to a binary [1,2] Negative , [3,4,5] Non Negative Next, we ran progressively complex base models. Linear classifier using a lemmatized TF-IDF model Naive Bayes classifier on the same lemmatized TF-IDF model Given the results (discussed on next slide), we decided to extend our modeling to include BERT-based models. Since our dataset does not include a relatively large number of training examples in comparison with what BERT is trained on, we went with a fine-tuned, lightweight version of BERT: distilbert-base-uncased-finetuned-sst-2-english. This is a BERT-based model trained on the Stanford Sentiment Treebank for binary sentiment (the 2 in sst2). 8
SENTIMENT MODELING Below we see the comparison of models using F1 score. F1 was chosen as we want to balance precision AND recall but also have class imbalance. We have included accuracy as well just for added comparative insights. Model F1 Accuracy Linear 0.8183 0.7337 Na ve Bayes 0.8262 0.7067 DistilBERT-sst2 0.8706 0.8027 As we can see DistilBERT-sst2 out performed the simpler models in both evaluation metrics. It is worth nothing that BERT-based models are developed using millions of data points and so, as with any model, we would expect better performance given model training examples. NOTE: Na ve Bayes and various gradient boosted models were run previously on the same data set using NLP libraries from R (tm, LDA, STM, etc.) and those results were much poorer than what we were able to obtain using python. See Appendix 1 for results from that study (Siebel et. al, 2021) 9
FUTURE WORK Future work using transformer technologies includes the following: Domain tuned GPT-NeoX for more cohesive topic representations. Historically, we have used human-in-the-loop coding of topics by hand reading clustered documents to develop topic representations (i.e. topic labels). Testing using OpenAI s GPT model Davinici has shown great promise for more cohesive topic representations. Our platform is not connected to the external internet so we cannot yet leverage this API. A fine-tuned smaller version, GPT-NeoX could provide this functionality. Transformer based NER methods of automatic PII redaction. Redacting PII requires copious labor hours, so to reduce these labor costs an NER model for entity extraction using transformer based spaCy model could be trained to do this automatically. 10
SUMMARY The Defense Personnel Analytics Center has been working to extend its NLP platform to incorporate cutting edge transformer technologies in order to apply to historical and future free-text fields captured from a variety of DoD surveys. We have found that these transformer-based methods provide the following benefits over previous methods: Better sentiment predictions for future inference Standardized methods of preprocessing using transformer tokenization (sentiment analysis) and sentence-transformer embeddings (topic modeling) Richer sub-modeling techniques for topic modeling (hierarchical topic modeling, dynamic topic modeling) And the potential to future utilize LLM s for model development 11
SOURCES Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982 3992, Hong Kong, China. Association for Computational Linguistics. Grootendorst, Maarten R.. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. ArXiv abs/2203.05794 (2022): n. pag. 12
APPENDIX Algorithm Text Preprocessing Sampling Procedures Accuracy (%) F1 (%) Bigram, Term Frequency Na ve Bayes Data Augmentation 62.6 63.1 Unigram, Term Frequency Random Forest Downsampling 71.2 63 Unigram, TF-IDF Data Augmentation, Downsampling Random Forest 71.3 65 Random Forest Word Embeddings Downsampling 69 63.4 Generalized Boosted Decision Tree Unigram, Term Frequency Data Augmentation 70.5 73.7 Generalized Boosted Decision Tree Word Embeddings None 70.6 77.1 Sentiment Dictionary VADER Attributes None 53.2 48.4 13