Archive for the ‘Resources’ Category

 

Joshua on Jun 3rd, 2008Enron Dataset

Ben Fry has a cool graphic illustrating some word frequency data from the Enron Email Dataset. The graphic itself isn’t very useful (Fry admits this in the caption), but it’s certainly interesting as an attempt to make you feel “inside the information,” Matrix-like.

Joshua on Jun 3rd, 2008Forvo

Via Omniglot I’ve learned of a new internets resource called Forvo. It’s a pronounciation repository.
Forvo is the place where you’ll find words pronounced in their original languages. Ever wondered how a word is pronounced? Ask for that word or name, and another user will pronounce it for you. You can also help others recording [...]

Joshua on Jun 3rd, 2008Welsh Corpus

There is now a Welsh and a Scotts Gaelic corpus available for download from Language Engineering Resources for the Indigenous Minority Languages of the British Isles and Ireland Project of Lancaster University. Kewl.

Joshua on May 26th, 2008WALS Online

This is one of the more exciting new sites I’ve seen on the web in quite some time. It’s called WALS Online - for “World Atlas of Linguistic Structures,” and it’s a map database of language-by-characteristic-features. From the homepage:

WALS consists of 141 maps with accompanying texts on diverse features (such as vowel inventory [...]