Pattern matching in abstract forms

Disclaimer I saved this because some things may be useful to others, however it does not decide what I originally tried to do.

Now I am trying to solve the following:

Given something like {a, B, {c, D}}, I want to scan through parse_transform / 2 Erlang forms and find every use of the send (!) Operator. Then I want to check the sent message and determine whether it matches the pattern {a, B, {c, D}}.

Therefore, we consider the following form:

{op,17,'!', {var,17,'Pid'}, {tuple,17,[{atom,17,a},{integer,17,5},{var,17,'SomeVar'}]}}]}]} 

Since the message sent:

 {tuple,17,[{atom,17,a},{integer,17,5},{var,17,'SomeVar'}]} 

which is the encoding {a, 5, SomeVar}, this would match the original pattern {a, B, {c, D}}.

I'm not quite sure how I will deal with this, but do you know any API functions that could help?

The transformation of a given {a, B, {c, D}} into a form is possible by first substituting the variables with something, for example. (and taking this into account), otherwise they will be unrelated, and then with:

 > erl_syntax:revert(erl_syntax:abstract({a, "B", {c, "D"}})). {tuple,0, [{atom,0,a}, {string,0,"B"}, {tuple,0,[{atom,0,c},{string,0,"D"}]}]} 

I thought that, having received them in the same format as this, I could analyze them together:

 > erl_syntax:type({tuple,0,[{atom,0,a},{string,0,"B"},{tuple,0,[{atom,0,c},string,0,"D"}]}]}). tuple %% check whether send argument is also a tuple. %% then, since it a tuple, use erl_syntax:tuple_elements/1 and keep comparing in this way, matching anything when you come across a string which was a variable... 

I think that in the end I will miss something (and, for example, I will learn some things, but not others ... although they should have coincided). Are there any API functions that I could use to facilitate this task? And as for the pattern matching operator or something similar on these lines, is that not so? (i.e. only here: http://erlang.org/pipermail/erlang-questions/2007-December/031449.html ).

Edit: (Explanation of things from the beginning of this time)

Using erl_types, as Daniel suggests below, is possibly doable if you play with erl_type () returned by t_from_term / 1, i.e. t_from_term / 1 accepts the term without free variables, so you will need to change something like {a, B, {c, D}} to {a, '_', {c, '_'}} (i.e. fill in the variables), use t_from_term / 1, and then go through the returned data structure and change the _ atoms to variables using the t_var / 1 module or something like that.

Before explaining how I thought about this, let me formulate the problem a little better.

Problem

I am working on a pet project (ErlAOP extension) that I will post on SourceForge when it is ready. Basically, another project already exists ( ErlAOP ), through which you can enter function calls before / after / around / etc ... (see doc if this is interesting).

I wanted to expand this to support code injection at the send / receive level (due to another project). I have already done this, but before starting the project, I would like to make some improvements.

Currently, my implementation simply finds every use of the send statement or takes an expression and introduces a function before / after / around (receiving expressions have a little gotcha due to tail recursion). Let me call this function dmfun (dynamic matching function).

The user will indicate that when the message forms, for example. {a, B, {c, D}}, then the do_something / 1 function must be evaluated before sending. Thus, the current implementation injects dmfun before each use of the send command in the source code. Then Dmfun will have something like:

 case Arg of {a, B, {c, D}} -> do_something(Arg); _ -> continue end 

where Arg can simply be passed to dmfun / 1, since you have access to forms generated from the source code.

So the problem is that any send statement will have dmfun / 1 entered before it (and the passed op message is passed as a parameter). But when sending messages like 50, {a, b}, [6, 4, 3], etc. These messages, of course, will not match {a, B, {c, D}}, so the dmfun / 1 injection when sending from these messages is waste.

I want to be able to choose plausible send operations, for example, for example. Pid! {a, 5, SomeVar}, or Pid! {a, X, SomeVar}. In both cases, it makes sense to enter dmfun / 1, because if during execution SomeVar = {c, 50}, then the user set do_something / 1 should be evaluated (but if SomeVar = 50, then it should not, because we are interested {a, B, {c, D}} and 50 do not match {c, D}).

I wrote the following prematurely. This does not solve the problems that I had. I did not enable this feature. I left an explanation anyway, but if it was connected with me, I would have completely deleted this message ... I was still experimenting, and I don’t think there would be anything useful here.

Before explaining, let's:

msg_format = user-provided message format that will determine which messages sent / received are interesting (for example, {a, B, {c, D}}).

msg = actual message sent in the source code (for example, Pid! {a, X, Y}).

I gave an explanation below in the previous edition, but later found out that it would not correspond to some things that it should do. For instance. when msg_format = {a, B, {c, D}}, msg = {a, 5, SomeVar} will not match when necessary (by "match" I mean that dmfun / 1 should be inserted.

Let me call the "algorithm" described below Alg. The approach I took was to execute Alg (msg_format, msg) and Alg (msg, msg_format). The explanation below only goes through one of them. Repeating the same thing, getting another matching_fun(msg_format) function ( matching_fun(msg_format) instead of matching_fun(msg) ) and entering dmfun / 1 only if at least one of Alg (msg_format, msg) or Alg (msg, msg_format) returns true, then the result should be a dmfun / 1 injection, where the desired message can be generated at run time.

  • Take the message form that you will find in [Forms] provided by parse_transform / 2, for example. let's say you find: {op,24,'!',{var,24,'Pid'},{tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]}} So you take {tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]} which is the message sent. (binding to Msg).

  • Do fill_vars (Msg) where:

     -define(VARIABLE_FILLER, "_"). -spec fill_vars(erl_parse:abstract_form()) -> erl_parse:abstract_form(). %% @doc This function takes an abstract_form() and replaces all {var, LineNum, Variable} forms with %% {string, LineNum, ?VARIABLE_FILLER}. fill_vars(Form) -> erl_syntax:revert( erl_syntax_lib:map( fun(DeltaTree) -> case erl_syntax:type(DeltaTree) of variable -> erl_syntax:string(?VARIABLE_FILLER); _ -> DeltaTree end end, Form)). 
  • Make form_to_term / 1 on 2 output, where:

     form_to_term(Form) -> element(2, erl_eval:exprs([Form], [])). 
  • Make term_to_str / 1 on output 3, where:

     -define(inject_str(FormatStr, TermList), lists:flatten(io_lib:format(FormatStr, TermList))). term_to_str(Term) -> ?inject_str("~p", [Term]). 
  • Do gsub(v(4), "\"_\"", "_") , where v (4) is 4 exits, and gsub: (taken from here )

     gsub(Str,Old,New) -> RegExp = "\\Q"++Old++"\\E", re:replace(Str,RegExp,New,[global, multiline, {return, list}]). 
  • Bind a variable (e.g. M) to match_fun (v (5)), where:

     matching_fun(StrPattern) -> form_to_term( str_to_form( ?inject_str( "fun(MsgFormat) -> case MsgFormat of ~s -> true; _ -> false end end.", [StrPattern]) ) ). str_to_form(MsgFStr) -> {_, Tokens, _} = erl_scan:string(end_with_period(MsgFStr)), {_, Exprs} = erl_parse:parse_exprs(Tokens), hd(Exprs). end_with_period(String) -> case lists:last(String) of $. -> String; _ -> String ++ "." end. 
  • Finally, take the user-provided message format (which is indicated as a string), for example. MsgFormat = "{a, B, {c, D}}" and execute: MsgFormatTerm = form_to_term (fill_vars (str_to_form (MsgFormat))). Then you can M (MsgFormatTerm).

eg. with user-provided message format = {a, B, {c, D}} and Pid! {a, B, C} found in code:

 2> weaver_ext:fill_vars({tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]}). {tuple,24,[{atom,24,a},{string,0,"_"},{string,0,"_"}]} 3> weaver_ext:form_to_term(v(2)). {a,"_","_"} 4> weaver_ext:term_to_str(v(3)). "{a,\"_\",\"_\"}" 5> weaver_ext:gsub(v(4), "\"_\"", "_"). "{a,_,_}" 6> M = weaver_ext:matching_fun(v(5)). #Fun<erl_eval.6.13229925> 7> MsgFormatTerm = weaver_ext:form_to_term(weaver_ext:fill_vars(weaver_ext:str_to_form("{a, B, {c, D}}"))). {a,"_",{c,"_"}} 8> M(MsgFormatTerm). true 9> M({a, 10, 20}). true 10> M({b, "_", 20}). false 
+4
source share
2 answers

erl_types (HiPE) has functionality.

I'm not sure if you have the data in the correct form to use this module. I seem to remember that the terms Erlang are used as input. If you find out the question about the form, you can do everything you need with erl_types:t_from_term/1 and erl_types:t_is_subtype/2 .

It was a long time ago that I last used them, and I only ever ran my test version, not compile time. If you want to look into the usage pattern from my old code (it no longer works), you can find it on github .

+2
source

I do not think that this is possible at compile time in the general case. Consider:

 send_msg(Pid, Msg) -> Pid ! Msg. 

Msg will look like a var , which is completely opaque. You cannot determine if it is a tuple or a list or an atom, since anyone can call this function using anything supplied for Msg .

This would be much easier to do at runtime. Every time you use an operator ! , you need to call the wrapper function instead, which tries to match the message you are trying to send, and does additional processing if the template is matched.

0
source

All Articles