If you really want the tokenizer to skip all characters that don't belong, just use the following rule as the very last rule: SKIP : { < ~[] > }

Saving as utf-8 did the trick!

Encountered: "E" (69), after : "" at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParserTokenManager.getNextToken(TokenSequenceParserTokenManager.java:1029) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.jj_ntk(TokenSequenceParser.java:3353) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.CoreMapNode(TokenSequenceParser.java:1386) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.NodeBasic(TokenSequenceParser.java:1360) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.NodeGroup(TokenSequenceParser.java:1327) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.NodeDisjConj(TokenSequenceParser.java:1266) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.BracketedNode(TokenSequenceParser.java:1127) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.SeqRegexBasic(TokenSequenceParser.java:833) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.SeqRegexDisjConj(TokenSequenceParser.java:1020) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.SeqRegex(TokenSequenceParser.java:790) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.SeqRegexWithAction(TokenSequenceParser.java:1643) at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.parseSequenceWithAction(TokenSequenceParser.java:37)

No token can start with ;. < There only an error if the next characters are not - or <-. > There always is an error.

Personal Open source Business Explore Sign up Sign in Pricing Blog Support Search GitHub This repository Watch 352 Star 2,563 Fork 965 stanfordnlp/CoreNLP Code Issues 76 Pull requests 0 Projects Why are Matthew, Mark, and Luke called 'synoptic' gospels? Lexical Error At Line Encountered Eof After How to describe very damaging natural weapon attacks from a weak creature What is this shrub/plant? Lexical Error In Java How can Data be property of Starfleet?

If you want to eliminate all these TokenMgrErrors, make a catch-all rule (as explained in the FAQ): TOKEN: { } Make sure this is the very last rule

The single line comment works ok.

I am trying to extract patterns using GetPatternsFromDataMultiClass. We are very soon releasing the new version of the code. If I validate just the style sheet it is ok. No token can start with > I'm not exactly sure why you would expect these not to be errors, since your lexer has no rules to cover these cases.

This rule says that, when no other rule applies, the next character is treated as an UNEXPECTED_CHARACTER token.

Browser specific property dev syntax such as -moz is not worth worrying over unduly, I have had need to use it and am not too concerned that line won't validate. Encountered: "\u0a00" (2560), after : ""

Truth Stone: Effects on the justice system, and criminal world My boss asked asked if my colleague is underperforming Why didn't Hans Gruber know what Mr.