o corpus do portuguÍs

New interface
Corpus size
Compare to other corpora
Related resources


Contact us

We hope that you want to help improve the new billion word corpus, just because it helps provide a great resource for others. But we're probably all at least a little bit selfish, and it's nice to have some "personal" rewards as well. :-)

As you make corrections to the lemmatization and part of speech tagging, you can also "earn credit" that can be applied to resources that are related to the corpora:

  • A contribution for the corpora, which means that you will have at least 200 queries per day (students) or 400 per day (professor), and you won't see the messages that appear every 10-15 searches, asking for a donation.

  • An academic license. It might be a good idea for a professor to have 5-10 students help with the corrections, and within just a couple of weeks you might have enough for an academic license for your entire university.

  • We will also be creating word frequency and full-text data from this billion word corpus, which will be similar to data that is already available for English (word frequency, collocates, n-grams, and full-text data).

Here's how it works:

  • It takes on average about 20-30 seconds for each word that you review. So if you worked without stopping for an hour, that would be about 150 words.

  • As you make corrections, they will automatically get logged under your name in the "corrections database".

  • We'll "pay" (in licenses and data) $15 USD per hour, which at ~150 entries per hour should translate to about 10 cents per entry

  • Let us know when and how you'd like to "redeem" your credits (the equivalence of a contribution to the corpora, an academic license, corpus data, etc)

Any questions, please let us know.

Thanks again for your help!