o corpus do portuguÍs


Corpora
New interface
Corpus size
Compare to other corpora
Related resources
Researchers

Volunteer!

Problems
Contact us




English PortuguÍs

Created by Mark Davies, BYU. Funded by the US National Endowment for the Humanities (2004, 2015). Part of the BYU collection of corpora.

  Corpus Size Created More info
1 Genre / Historical 45 million words 2004-06 Info
2 Web / Dialects 1 billion words 2015-16 Info

The new addition to the Corpus do PortuguÍs (2016) contains about one billion words of data in web pages from four different Portuguese-speaking countries (Brazil, Portugal, Angola, Mozambique). This corpus allows you to look at very recent Portuguese (the texts were collected 2013-14), and to compare among the different dialects.

The new corpus is also much larger than the previous corpus -- more than 50 times as large for Modern Portuguese (one billion words, compared to just 20 million words from the 1900s in the original corpus). So where you might have 20-25 tokens with the original corpus, you might have 1,000 or more with the new corpus.