Tagging Spanish text

Can we build an archive of literary texts in Spanish and tag it with Stanford Core NLP, in such a way that it would then be susceptible to manipulation by Red Hen utilities?

Would you like to accomplish all or part of this task?
If so, write to 
and we will try to connect you with a mentor.

Some additional information:

Download CoreNLP and the Spanish models for it from here:

Then run it with the following command line:

java -cp stanford-corenlp-3.5.2.jar;stanford-spanish-corenlp-2015-01-08-models.jar -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file hola.txt

This is the output:

Adding annotator tokenize

TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.

Adding annotator ssplit

Ready to process: 1 files, skipped 0, total 1

Processing file D:\stanford-corenlp-full-2015-04-20\hola.txt ... writing to D:\s

tanford-corenlp-full-2015-04-20\hola.txt.out {

  Annotating file D:\stanford-corenlp-full-2015-04-20\hola.txt

} [0.170 seconds]

Processed 1 documents

Skipped 0 documents, error annotating 0 documents

Annotation pipeline timing information:

TokenizerAnnotator: 0,0 sec.

WordsToSentencesAnnotator: 0,0 sec.

TOTAL: 0,0 sec. for 5 tokens at 147,1 tokens/sec.

Pipeline setup: 0,0 sec.

Total time for StanfordCoreNLP pipeline: 0,2 sec.

D:\stanford-corenlp-full-2015-04-20>cat hola.txt.out

Sentence #1 (2 tokens):


[Text=Hola CharacterOffsetBegin=0 CharacterOffsetEnd=4]

[Text=. CharacterOffsetBegin=4 CharacterOffsetEnd=5]

Sentence #2 (3 tokens):

Que tal.

[Text=Que CharacterOffsetBegin=6 CharacterOffsetEnd=9]

[Text=tal CharacterOffsetBegin=10 CharacterOffsetEnd=13]

[Text=. CharacterOffsetBegin=13 CharacterOffsetEnd=14]