19 October 2007

Gender and text, gender and speech

For some crazy reason I decided a while ago that I wanted to learn Japanese. Essentially, I wanted to learn a language as unlike English as I could find. So I did some summer intensive thing before college (that amounted to a year of class) and then continued taking class for all three years of undergrad. At the end, I could get by passably for most conversation topics (business, politics, current events, etc.) other than research stuff (at some point I learned how to say NLP, but I don't remember anymore...I wonder if en-eru-pi would be understood...). During the whole time we were required to meet weekly with conversation partners so as to practice our speaking skills.

For the first "semester" during the summer, I had a male professor. For all remaining seven semesters, my profs were female. With the exception of one conversation partner (who was from Hokkaido and spoke quicky with a strong accent and who was quickly replaced by someone who I could understand a bit more), all of my conversation partners were female.

At the end of my four years, I was speaking to a frien (who was neither a conversation partner nor a prof) in Japanese and after about three turns of conversation, he says to me (roughly): "you talk like a girl."

Based on the set up of this post, you may have seen that coming. But the thing that is most interesting is that Japanese is not one of those languages where the speaker's gender is encoded in (eg.) verb morphology. In fact, as best I could tell at that point, the only thing that I did that was effeminate was to use too many sentence ending particles that were more commonly used by women (-ka-na, I think, was one, but it's been too long now to really remember). The guy who said this to me was a close enough friend that I tried to figure out what it was about my speech that made him assess that I talk like a girl. The sentence particle thing was part of it, but he said that there was also something else that he couldn't really figure out; he was hypothesizing it was something to do with emphasis patterns.

It's not at all surprising that given that the majority of native speakers that I talked to were female, that if there were some underlying bias that was sufficiently subtle that the profs weren't able to intentially avoid it, that I would have picked it up.

Now, getting back to en-eru-pi. There's been a reasonable amount of work in the past few years on identifying the gender of the authors of texts. I know both Moshe Koppel and Shlomo Argamon, to name two, have worked on this problem. I also remember seeing a web site a year or so ago where you could enter a few sentences that you wrote and it would guess your gender. I don't remember what it cued off of---i think distribution of types of verbs and adjectives, mostly, but I do remember that given a short paragraph, it's shockingly accurate.

What I don't know is (a) if anyone has done this for something other than English and (b) if someone has done it for speech. Of course, if you have speech, you have extra information (eg., pitch) which might be useful. But given my Japanese friend's reaction to my speech pattern (my voice is rather low), there has to be more going on. And I'm not convinced that what is going on will be the same between written text and (say) transcribed speech. If someone wanted to try such an experiment for non-English text, you could probably just mine non-English from some social networking site (like myspace or facebook), where people tend to list their genders. I'm not sure how to do it for speech. Maybe there's some speech transcription corpus out there that's annotated with gender, but I don't know what it is. Although I don't see a huge financial marked out there for an answer, I'm personally curious what it is about my English writing patterns that made the web site I refered to earlier strongly convinced that I'm male, and what it is about my Japanese speech patterns that make it clear that I'm not.


  1. It sounds like the website you're referring to is The Gender Genie, at http://bookblog.net/gender/genie.php .

  2. As a linguist, I've heard this story a dozen times: a male learns Japanese from women, and then is told he speaks like a female.

    It's remarkably common. Common enough, in fact, that good old Wikipedia has a page called "Gender differences in spoken Japanese" http://en.wikipedia.org/wiki/Gender_differences_in_spoken_Japanese

    The most relevant quote is this: “Feminine speech includes the use of specific personal pronouns... omission of the copula da, use of feminine sentence finals such as wa, and the more frequent use of the honorific prefixes o and go.”

    You might also want to look at this page: Manifestations of Gender Distinction in the Japanese Language http://www.coolest.com/jpfm.htm

  3. Having had to stare at some speaech transcripts recently, you could try identifying speaker gender by looking for cue phrases like "my wife" or "my husband" that could be strongly indicative of gender. Might not get much, but a Google search for "gender labeled speech transcripts" doesn't look promising.

    I heard Dan Jurafsky recently collected a set of speed dating conversations: I don't know if they're transcribed, or how easy it'd be to gain access to it (privacy issues and all), but it might be worth asking if it's something you plan on checking out further.

  4. I've experienced this first hand as a high school exchange student. I think that it wasn't that women taught me, it was that women talked a lot more and I wasn't used to consciously listening for speech differences between genders. There definitely are differences, especially in the choice of personal pronoun -- it's not a hard and fast rule, men just have a larger set of personal pronouns to choose one, but if you always use the safe and gender neutral "watashi" that they teach in textbooks you will sound female.

    I had another problem where I found it really hard to learn formal Japanese. As a high school student I was at the bottom of the hierarchy and so it was rare that anyone would speak to me using formal language. At the same time, I was expected to speak formally to teachers and other adults.


  5. There's lots of stuff out there on acoustic classification into all kinds of categories, including gender. Most use some mix of pitch, rate, segmental and lexical information. Gender's a pretty easy task with speech given baseline pitch differences.

    Almost all of the speech corpora come with not only gender, but lots more demographic information. Accent's pretty important in speech.

    Hint: the speech folks tend to use "identification" rather than classification in titles if you do a search.

  6. At ACL 2005 I saw this talk:

    Constantinos Boulis; Mari Ostendorf
    A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations. ACL 2005

    They were working with transcribed spoken English language. What's interesthing about it is that they claim to not only be able to determine the gender of the speaker, but based on changes on his/her lexical patterns, the gender of the conversation partner as well.

  7. "You talk like a girl" is a polite way of saying "you sound gay". That's how gay males talk in Japan. There's an additional level of indirection to this - folks who know about this peculiarity of Japanese language and who picked it up their vocabulary and intonation by talking with women try to "overcorrect" and as a result sound "uncouth".

  8. Gender Genie gets my gender right about 2/3 of the time. I'd like to see a proper evaluation.

    I'd be surprised if there were any system right now that got it right >90% of the time for, say, a college-educated population.

  9. Interesting topic.
    "en-eru-pi" is a perfectly legitimate Japanese for researchers, but in a traditional word, NLP is "shizen-gengo-shori" (natural-language-processing).
    BTW, I recently found that in conversations at the famous social networking site (Japanese equivalent of myspace.com?), women there frequently use some postpositions (like "yo") that is usually not regarded as female-like, but actually that men won't use in written texts. In other words, there are definitely some kind of female Japanese that aren't widely recognized to represent a gender in Japanese.

  10. One of Chinese biggest web company Netease (Nasdaq: NTES) has released a blog gender detector based on your blog posts. (It's for Chinese only.)

    As a incomprehensive sample from my friends who are majorly Phd students in Computer Science or Software Engineers, their precision is pretty high. (Almost 100%)

  11. The address of the detector service is here:

    Just input the address of your blog you will get the result. You can ask your Chinese students/colleagues/friends to try.

  12. 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花