Programming

Published Stata Commands

ldagibbs: A command for Topic Modeling in Stata using Latent Dirichlet Allocation
[Stata Journal 18(1), pp. 101–117, 2018.] [Code Files]

This paper introduces the ldagibbs command which implements Latent Dirichlet Allocation in Stata. Latent Dirichlet Allocation is the most popular machine learning topic model. Topic models automatically cluster text documents into a user chosen number of topics. Latent Dirichlet Allocation represents each document as a probability distribution over topics, and each topic as a probability distribution over words. Thereby, Latent Dirichlet Allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


lsemantica: A Stata Command for Text Similarity based on Latent Semantic Analysis 
[Stata Journal 19(1), pp. 129–142, 2019.] [Code Files]

The lsemantica command, presented in this paper, implements Latent Semantic Analysis in Stata. Latent Semantic Analysis is a machine learning algorithm for word and text similarity comparison. Latent Semantic Analysis uses Truncated Singular Value Decomposition to derive the hidden semantic relationships between words and texts. lsemantica provides a simple command for Latent Semantic Analysis as well as complementary commands for text similarity comparison.


Stata Utility Functions

The following Stata commands were written to avoid having to open Excel or CSV files just to merge or append them. The commands are provided without a helpfile, but they combine the syntax of import delimited/excel and merge. The only major difference is that the merge level (e.g. 1:1) has to be specified in the “how” option. The code below shows code examples for the 4 commands.

Code Examples:

merge_csv id using "file_path" , how("1:1") bindquote("strict") varnames(1) case("preserve")  encoding("utf8") keepusing(varlist) nogenerate

merge_excel id using "file_path", how("m:1")  sheet("sheetname") firstrow 

append_csv using "file_path", bindquote("strict") varnames(1) case("preserve")  encoding("utf8") force

append_excel using "file_path",  sheet("sheetname") firstrow