The making of the CorDis Corpus: compilation and mark-up