tag:blogger.com,1999:blog-19803222.post870593167537446247..comments2024-03-18T01:45:45.724-06:00Comments on natural language processing blog: De-Authorship attributionhalhttp://www.blogger.com/profile/02162908373916390369noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-19803222.post-21271864749603986272012-03-09T07:53:01.411-07:002012-03-09T07:53:01.411-07:00Have you seen this one?
Michael Brennan and Rache...Have you seen this one?<br /><br />Michael Brennan and Rachel Greenstadt. Practical Attacks Against Authorship Recognition Techniques (pre-print) in Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, California, July 2009.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-71544330393441044502012-02-14T17:26:44.100-07:002012-02-14T17:26:44.100-07:00From the world of more "popular culture"...From the world of more "popular culture".... Cory Doctorow recently had <a href="http://boingboing.net/2011/12/29/state-of-adversarial-stylometr.html" rel="nofollow">a post</a> on Boing Boing related to this. Sadia Afroz and Michael Brennan, both CS PhD students of Rachel Greenstadt at Drexel, gave a talk about beating stylometry at the Chaos Computer Congress in Berlin, presenting an alpha stage tool aimed at this end.Timnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-2732714047200150522012-02-13T13:23:02.217-07:002012-02-13T13:23:02.217-07:00There has been some very relevant work done, altho...There has been some very relevant work done, although the results are still pretty unimpressive. <br /><br />The two groups that come to mind first are:<br /><br />Reiter, Ehud and Williams, Sandra (2008). Three Approaches to Generating Texts in Different Styles. In: Proceedings of the Symposium on Style in text: creative generation and identification of authorship, Volume 7, The Society for the Study of Artificial Intelligence and the Simulation of Behaviour (AISB 2008), 1-4 April 2008, University of Aberdeen, UK.<br /><br />Mairesse, F., & Walker, M. a. (2010). Towards personality-based user adaptation: psychologically informed stylistic language generation. User Modeling and User-Adapted Interaction, 20(3), 227-278. doi:10.1007/s11257-010-9076-2<br /><br /><br />Those may not be their best papers, but they should get you started. I know that they have done work on first extracting a person's stylistic characteristics, and then remapping them onto another text stream...<br /><br />s.steve steinbergnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-6177335533686315942012-02-13T12:27:33.342-07:002012-02-13T12:27:33.342-07:00Maybe this is a topic worth investigating seriousl...Maybe this is a topic worth investigating seriously. Potentially there may be scenarios where authorship attribution can harm free speech (e.g. oppressive governments linking anonymous blog-posts to real identities). <br /><br />Here's one related reference:<br />Kacmarcik and Gamon, <a href="http://research.microsoft.com/pubs/69343/aclcoling06-kacmarcik-gamon.pdf" rel="nofollow">"Obfuscating Document Stylometry to Preserve Author Anonymity"</a> (ACL06) <br /><br />We may want to start with some formal definition, such as k-anonymity. I'm guessing it's relatively easy to come up with some paraphrase/NLG method that can fool current systems but the challenging part is to mathematically prove that the method really satisfies your definition of anonymity.Kevin Duhhttps://www.blogger.com/profile/07407894290644783502noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-2061097232833098852012-02-12T07:47:44.717-07:002012-02-12T07:47:44.717-07:00@Rachel: I agree -- in fact, I basically think of ...@Rachel: I agree -- in fact, I basically think of paraphrasing as a form of NLG, in the spirit of text-to-text generation.halhttps://www.blogger.com/profile/02162908373916390369noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-71280219645928514062012-02-12T02:10:43.359-07:002012-02-12T02:10:43.359-07:00What about looking at language generation for this...What about looking at language generation for this? You could map what you want to say using a semantic model, and then apply basic NLG to generate sentences encapuslating that. It might be easily identifiable as NLG text, but it ought to be hard to figure out who sat behind the keyboard.Rachel Cotterillhttps://www.blogger.com/profile/08969719330048416996noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-57980099188898456392012-02-12T01:27:16.239-07:002012-02-12T01:27:16.239-07:00You could hire turkers/students/colleagues to rewr...You could hire turkers/students/colleagues to rewrite your posts ;)Benhttps://www.blogger.com/profile/14899730898792268821noreply@blogger.com