This paper was prepared after exploratory work in telephony studies of open-sourced feedback and complaints from victims of robocalls and of predatory phishing scams.
In this paper, we are the first to apply an unsupervised retrieval technique to crowd-sourced data (COC) in or- der to describe the authenticity of phone numbers. We considered over 100,000 complaints collected via crowd- sourced efforts (e.g., 800notes.com). Overall, our results show that the proposed generative probabilistic model produces topics that are quantitatively and qualitatively better than matrix decomposition techniques, such as Latent Semantic Analysis (LSA). In this paper, we propose this model as a more efficacious mechanism for authen- ticating phone numbers. The proposed model identifies malicious phone-numbers adaptively. Additionally, the model labels and ranks numbers by their re-use across different scam campaigns. This last insight, in particular, possesses implications for research combating phishing attacks, account fraud and identity theft.
Recommended citation: Ahluwalia, S. & Reaves, B. (2018, April). Repucall. Paper submission at the North Carolina State University OUR Symposium, Raleigh, NC.