o corpus do portuguÍs


Corpora
New interface
Corpus size
Compare to other corpora
Related resources
Researchers

Volunteer!

Problems
Contact us




Note: the following page makes reference to the English corpora from corpus.byu.edu, but the Corpus do PortruguÍs uses the same new interface, which was released in 2016.


In May 2016 we also released a new version of the BYU corpora. The following are the major changes to the corpus architecture and interface. (Problems?...)

1. More mobile-friendly

The previous interface had lots of frames. These worked well on laptop and and desktop computers, but not very well with mobile phones or tablets. The new interface is designed from the ground up to work on screens of any size. The following are some screenshots from a mobile phone for (left to right) the search interface, results display, Keyword in Context display (KWIC), and expanded KWIC. The interface looks even better on a device with a larger screen, but the bottom line is that the corpora now look and work fine, no matter what device you're using. (Note: the older interface will still be online as well, at least for the foreseeable future.)


2. Cleaner, more simple interface

The search form in the previous interface was a bit overwhelming (below, left). The newer interface is much cleaner and simple to use (below, right). All of the previous functionality is still there -- the ability to limit by and compare sections of the corpus, deciding how to sort the data, etc -- but now those form elements only appear when you need them.

Previous interface

 

New: simple list view

New: collocates view


3. More helpful help files

Context-sensitive help files now appear whenever you click on a form element -- list, collocates, compare words, sections, virtual corpora, etc. And there are sample searches in each of these files, which you can modify to make your own searches.


4. Simpler, more intuitive search syntax

Some search syntaxes are (in our view) unnecessarily complex, like the CQP syntax on the left. The previous BYU search syntax had a much simpler syntax, but there were still too many square brackets, full stops, asterisks, etc (no fun to type these on a mobile phone keyboard). We have now simplified the search syntax even more, as is shown on the right. But while they're learning the newer, simpler syntax, people can still use any combination of the older and newer syntax.  (For more information, including the new part of speech codes, click on LIST in the search form of a corpus, and then Part of Speech)

Type of search CQP syntax Previous BYU syntax New syntax Example
Word [word = "nooks"] nooks nooks nooks and crannies
Lemma (forms of word) [lemma = "decide"] [decide] DECIDE DECIDE that it
Part of speech [tag = "NN."] [nn*] NOUN fast NOUN
Synonyms Not possible [=soft] =soft soft, smooth, quiet
Customized word lists Not possible [emailAddress@clothes] @clothes dress, shoe, sock
Combinations of preceding [lemma = "end" & pos = "VV."] [end].[v*] END_v end, ends, ended, ending
Combinations of preceding [lemma = "eat"] [tag = "NN."] [eat] * [nn*] EAT * NOUN ate the bananas, eat some cake
Combinations of preceding Not possible [[emailAddress@clothes]] @CLOTHES dress, dresses, shoe, shoes
Combinations of preceding Not possible [[=clean]].[v*] =CLEAN_v cleans, scoured, washing
Combinations of preceding Not possible [wear] * [=nice] [email@clothes] WEAR * =nice @CLOTHES wore some good-looking pants


5. Virtual corpora

In early 2015 we added the ability to create "virtual corpora" for the Wikipedia corpus. In just a few seconds, users could create a virtual corpus of texts related to biology, Buddhism, investments, basketball -- or thousands of other topics. They could then modify these corpora -- adding, deleting, or moving texts. They could limit their searches to a particular virtual corpus (e.g. collocates of stress in psychology or engineering), and compare the frequency of a word or phrase in their different virtual corpora. And best of all, they could create keyword lists for any of the virtual corpora -- in just a few seconds.

We have now added the "virtual corpus" functionality to all of the BYU corpora, which allows you to quickly and easily create and use virtual corpora from any of the texts in these corpora. For example, you could create a virtual corpus of texts from Cosmopolitan or Astronomy magazines (COCA), newspaper articles dealing with the New Deal from 1932-1938 (COHA), web pages from a particular website dealing with cricket in the UK (GloWbE), speeches by Winston Churchill from 1939 to 1945 that mention Germany (Hansard), or newspaper articles from September 2015 dealing with the European refugee crisis (NOW). Click on VIRTUAL/TEXTS in any of the corpora for much more detail and some great examples of these virtual corpora.


We hope that these new features and corpora will be of benefit to you in your teaching, learning, and research.