The file you are using is invalid and the command is incomplete. The following is the command you should use.
java -cp "*" edu.stanford.nlp.sentiment.Evaluate -model edu/stanford/nlp/models/sentiment/sentiment.ser.gz -treebank test.txt
text.txt ,
.
(2 (3 (3 Effective) (2 but)) (1 (1 too-tepid) (2 biopic)))
(3 (3 (2 If) (3 (2 you) (3 (2 sometimes) (2 (2 like) (3 (2 to) (3 (3 (2 go) (2 (2 to) (2 (2 the) (2 movies)))) (3 (2 to) (3 (2 have) (4 fun))))))))) (2 (2 ,) (2 (2 Wasabi) (3 (3 (2 is) (2 (2 a) (2 (3 good) (2 (2 place) (2 (2 to) (2 start)))))) (2 .)))))
(4 (4 (4 (3 (2 Emerges) (3 (2 as) (3 (2 something) (3 rare)))) (2 ,)) (4 (2 (2 an) (2 (2 issue) (2 movie))) (3 (2 that) (3 (3 (2 's) (4 (3 (3 (2 so) (4 honest)) (2 and)) (3 (2 keenly) (2 observed)))) (2 (2 that) (2 (2 it) (2 (1 (2 does) (2 n't)) (2 (2 feel) (2 (2 like) (2 one)))))))))) (2 .))
(2 (2 (2 The) (2 film)) (3 (3 (3 (3 provides) (2 (2 some) (3 (4 great) (2 insight)))) (3 (2 into) (3 (2 (2 the) (2 (2 neurotic) (2 mindset))) (3 (2 of) (2 (2 (2 (2 (2 all) (2 comics)) (2
EVALUATION SUMMARY
Tested 82600 labels
66258 correct
16342 incorrect
0.802155 accuracy
Tested 2210 roots
976 correct
1234 incorrect
0.441629 accuracy
Label confusion matrix: rows are gold label, columns predicted label
323 1294 292 99 0
161 5498 2993 602 1
27 2245 51972 2283 21
3 652 2868 7247 228
3 148 282 2140 1218
Root label confusion matrix: rows are gold label, columns predicted label
44 193 23 19 0
39 451 62 81 0
9 190 82 101 7
0 131 30 299 50
0 36 8 255 100
Approximate Negative label accuracy: 0.912008
Approximate Positive label accuracy: 0.930750
Combined approximate label accuracy: 0.923128
Approximate Negative root label accuracy: 0.879081
Approximate Positive root label accuracy: 0.808266
Combined approximate root label accuracy: 0.842756
, :)!!