o corpus do portuguÍs


Corpora
New interface
Corpus size
Compare to other corpora
Related resources
Researchers

Volunteer!

Problems
Contact us




There are several resources that are based on the older version of the Corpus do PortuguÍs (which was released in 2006), such as:

The older Corpus do PortuguÍs was quite small, however (only 20 million words for the 1900s). As a result, there were many types of resources that we've created for English, which couldn't be created for Portuguese until a much larger corpus was available. With the new one billion word corpus, we can create many of these resources. They will include:

  • Full-text data, which means that you'd have nearly the entire two billion words of data on your machine

  • Updated data similar to the word frequency, collocates, and n-grams data (including the top 40,000 lemmas of Portuguese)

  • WordAndPhrase for Portuguese, which will allow you to browse through the top 40,000 lemmas to see frequency information, definition, collocates, concordances, and synonyms -- all on one page. In addition, you'll be able to input your own texts and analyze them with the corpus data (available Summer 2017).