All Posts

Data mining PII via optical character recognition on publicly hosted image sites pt. 4

Results The hypothesis that personally identifiable text data can be extracted from publicly hosted images was proven to be true. A narrow focus on usernames and passwords was applied in this research because that information can be exploited to gain more data from other accounts. From a data set of 1.18 million records, a focused subset of 6081 images were identified as high potential for containing compromising information. Out of those 6081 images, 1044 usernames and passwords, as well as 18 social security numbers, were extracted.

Performance Monitoring with Lighthouse and Gitlab Part 1

This is part one of a series of articles on monitoring website performance with Lighthouse. The goal of this first post is to just get the bare minimum up and running, it’s aimed more for those that don’t have much experience with Lighthouse, Gitlab, etc. The Lighthouse tool offers a lot of functionality, for more in-depth details please checkout their docs here and here. Also worth mentioning, Gitlab does provide a baked in solution for this, but you have to be on a paid tier of their service to use it.

Increasing News Literacy pt 1

This post is the first in a planned series on how to consume news stories for understanding and how to extract facts. Next will be Who owns the news outlets part 2 Social media is not a news source It is increasingly difficult to trust information that we read and hear from others. The internet has evolved in such a way that allows us to be flooded by articles, opinions, posts, and other forms of media.

Data mining PII via optical character recognition on publicly hosted image sites pt. 3

Data Collection & Methodologies The data for this project was collected from https://prnt.sc using a Node.js script and the OCR package TextBoxes. Data Collection All screenshots taken using the Lightshot application are hosted at a public URL of the form https://prnt.sc/****** where the trailing six characters are the unique identifier of the uploaded image. The string is alphanumeric, so there are 36^6 (2.1 billion) possible combinations. On the homepage, the site reports that there are just over 1.

Using Netlify Forms with Nuxt.js

On a recent side project, I decided to try out Netlify for hosting. Seemed pretty straightforward, plus they had some cool features like Forms that I wanted to take advantage of, so I figured why not. Well, turns out, implementing said Forms feature was not easy as I had hoped. After doing a lot of digging around on the internet, it seems others have struggled with this as well, so I decided to make a post showing what I did to get it all working.

Data mining PII via optical character recognition on publicly hosted image sites pt. 2

Introduction On June 14, 2016, Ellen Nakashima of The Washington Post published a story that the Democratic National Convention (DNC) had been infiltrated by two teams of state-sponsored Russian hackers (Greenberg, 2019). According to Nakashima, one of the groups, named Cozy Bear*, gained access to the DNC’s email and chat communications and had been monitoring those channels for over a year. The other group, Fancy Bear, gained access to DNC servers in April 2016 and exfiltrated opposition research documents.