The Map Is Not The Territory

A blog by Christian Willmes.

The daily kindergardening of OSGeo wiki spammers

| categories: osgeo, semantic mediawiki | View Comments

I regret to bother you with this topic, but I need to write something about my frustration with increasing spam activity in the OSGeo wiki. It is really unbelievable how much human time resources these spammers invest to put some links and upload some documents into the wiki.

Since some time I do some voluntary work in helping to maintain the OSGeo wiki. I do this because I have some Mediawiki and Semantic Mediawiki knowledge from my other research and work projects, that I am happy to share with the OSGeo community.

Originally the OSGeo wiki was linked to the central OSGeo LDAP directory for identity and account management, thus in this time the user account management was not carried out through the wiki but through that LDAP directory. Since about two years now, the LDAP integration with the wiki has been broken, because the extension we used would not have been updated to work with the newer versions of Mediawiki.

Meanwhile, because I myself felt not knowledgeable enough about LDAP and Martin Spott tried but did not succeed to get another LDAP extension to work, we had to manage the user accounts from within the wiki. Because the standard account request/creation procedure of Mediawiki is not well protected against abuse, its actually really simple to let bots create huge numbers of spam accounts, we first disabled the account registration, and had new users request new accounts via email to the OSGeo SAC mailing list. After this proved to be unhandy, we decided to install the ConfirmAccount Extension, to handle account requests.

This extension requires from new users, additionally to a valid and confirmed email address, to provide a short biography about them self. This biography is then reviewed by SAC volunteers, to check if the requester is not a spammer. The SAC volunteer has the options to Accept, Reject, Hold, or to qualify the request as Spam. On Reject, the requester is informed with a standard note, that his request was denied. On Hold, the volunteer can ask for additional information from the requester to decide upon that if the request is valid, on spam the request is denied, but the requester is not informed, further more his email address is blocked from further requests. On Accept the user account is created with a random password and notified by email about this.

So far so good, but from here it gets messy, because we experience about ~10 account requests a day of which about 99% are fraudulent and or spam requests. And the spammers are actual humans from SEO companies, I guess. They make up all kinds of things, that let me be certain that they have some human agents pasting this into the requests. Here are some nice example biographies, I got to read:

User:Maleshwar: Born as a princess into a royal family of Kingdom of Dagbon, in the Northern Region of Ghana, Gunu has been interested in dancing and music since she was young. She competed in regional and national dance competitions, winning the dance championship for the northern Region and second place in the 1998 National Dance Championship. She took second place in the Hiplife dance championship in 2003, where she met King Ayisoba and Terry Bonchaka, who subsequently become collaborators.

Or:

User:Marshrobin088: Hi my name is Robin Marsh and I've been in the digital design industry for 3 years. As a kid, art and technology always interested me. I could lose track of time doing art or messing around with computers.The way I approach web development is keeping in mind scalability, organisation, and clean syntax. As for the message or purpose is the nucleus,Self learner,highly interested in Geospatial development activities using open source tools. Having knowledge of GIS,vector graphics programming and data bases. Involved in teaching Geology, web and geospatial development. I am proficient in HTML/HTML5, CSS/CSS3, LESS, SASS, XML, JavaScript, jQuery, AJAX, and SQL/MySQL/PostgreSQL, to name a few. I am also proficient in many non-web-based languages, including but not limited to Java, Scheme/Racket, C, ACL2 (LISP), and MIPS Assembly. I have also worked on some smaller Python projects, and have used the language to create one-time use tools for data processing and similar purposes.

On these two above requests, for example, I asked the requesters back with a standard phrase like “Can you please elaborate about your relation/interest in OSGeo? ”, and never heard back. Some request are easy to identify as spam like the following:

User:Baarishi: baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi is a good boy baarishi

Or:

User:Mekee4444: im a person that need this web id to produce my business in whole world

And here are two example bios of spammers that got through, because I thought that these were valid requests:

User:Ehalu2016: Hi my name is Eahul and I've been in the digital design industry for 5 years. As a kid, art and technology always interested me. I could lose track of time doing art or messing around with computers.The way I approach web development is keeping in mind scalability, organisation, and clean syntax. As for the message or purpose is the nucleus,Self learner,highly interested in Geospatial development activities using open source tools. Having knowledge of GIS,vector graphics programming and data bases. Involved in teaching Geology, web and geospatial development

Or:

User:Mayerjohntec: A web developer and software engineer by profession, An open source enthusiast and a maker by heart. Honored to be sharing space among the Leaders we look up to and admire. I love contributing my best to take the Open Source Mission and OPEN WEB forward. I hold a Masters in Computers degree and have been working and contributing towards the open source community in all ways I can. Love Code, Privacy and Advocacy, learning, teaching and Community Building. I support Open data and Open Knowledge. As am a social person, and love interacting with new people,traveling, reading books, history, museums and listening to all kinds of music. Thanks John Mayer

As you can see from these above examples, I have to read a lot of BS on a daily basis fighting spam requests and cleaning up behind some spammers that got through. And in some cases it is really not easy to decide if its spam or not. Right now I tend to accept request were I am not sure, because it is really easy to block a user and delete/revert all his/her edits ever made to the wiki, as soon as I see them spamming.

But in the end, its already more than half an hour of work per day, and it seemingly will not get less...


 

comments powered by Disqus

blog comments powered by Disqus