The Molyé corpus is a diachronic collection of stereotypical representation of French language variation during the early modern period as well as early attestations of French Creoles. The goal of the project is to demonstrate that several Creole features which are posited to be the result of pidgnization in French colonies can in fact be traced to Europe, both individually and in combination with each other. Below is a spreadsheet detailing both the contents of the Molyé corpus proper and the full list of 250+ documents that were found in the corpus creation process. The full list is still in an ongoing process of reorganization (and expansion).
As far as this repository, the Molyé corpus itself can be found as an XML file of the same name in the "main_corpus" folder. Of additional interest is the dataset_colaf folder, where we store the full documents used to create the Molyé timeline. The dataset_colaf folder is in turn divided into three folders: "theatre", "poetry" and "misc_works", which is mainly a by-product of the compilation procedure varying according to the type of work.
To find examples of a specific language, try searching xml:lang="[lang-code]". For convenience we have also included some pre-compiled subcorpora in the subcorpora folder. To access them, first go to the subfolder for the relevant time period and then find the file with the correct lang-code.
The HTML version of the dataset is available here.