Use Python regular expression to parse output string Java float Arrays.deepToString

I am working with someone using Java code where the key data structure is an array of mxnxp , float[][][] . I need to get it in Python; currently my approach is to save the array to a text file using Arrays.deepToString and then parse this text file with Python.

I got stuck on how to write a regular expression that parses txt. I can find all floats with associated exponents in scientific notation. To do this, I use the following template:

 float_pat = r'\d\.\d*(?:E-\d+)?' 

This works great to record floats in scientific notation, as they are displayed by deepToString. Please note that all values ​​are positive because they are probabilities. Ie, I have no problem with the way I write numbers myself.

What I cannot do, but what I would like to do is to search for regular expressions for any number of floats enclosed in left and right brackets. I tried this:

 list_of_floats_pat = r'\[(?:\d\.\d*(?:E-\d+)?), )+\]' 

where I am trying to find one or more cases of the float format, followed by a comma, and a space enclosed in square brackets. But this returns [] . Not sure what I don't understand.

Here is an example of a 2x1x1 array:

 [[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]] 

I want the regex to return two matches:

 0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5 

and

 0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5 

that I can just parse as lines with a strip and a split.

I figured out a workaround where I just find all the brackets. But I would like to know that I do not understand about regular expressions.

+7
java python string arrays regex
source share
2 answers

The data you have are valid python and valid json:

 >>> s = '[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]' >>> ast.literal_eval(s) [[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]] >>> json.loads(s) [[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]] 

You better understand these libraries than try to do this with a regex.

+4
source share
 \[(?:\d\.\d*(?:E-\d+)?)(?:, (?:\d\.\d*(?:E-\d+)?))*\] 

You will try it. Watch the demo.

https://regex101.com/r/9GergE/1

Problem with your regex

 \[(?:\d\.\d*(?:E-\d+)?), )+\] 

was that in the end before \] no, which he expected.

+3
source share

All Articles