Can I use the Perl Marpa parser for a public network server?

The Perl Marpa documentation contains the following section on tainted data :

Marpa :: R2 exists to allow its input to modify execution in flexible and powerful ways. Marpa should not be used with unreliable input. In Taint mode in Perl, a fatal error is to use the Marpa SLIF interface with corrupted grammar, corrupted input string, or corrupted token values.

I am not sure if I understand the consequences of this limitation. I understand that grammar should not be spoiled. But I do not understand that the entrance should not be spoiled. For me, the task of the analyzer is to verify input. For me, it seems unreasonable that the parser should trust its input.

Is this really so? Is it impossible to implement any public network service with Marpa?

I ask about this because one of the reference use cases is an HTML pars marker , and it seems contradictory to use an HTML parser that cannot be used with corrupted data, although about 99.99% of all HTML may be corrupted.

Can someone explain this contradiction?

+7
perl parsing taint marpa
source share
2 answers

Marpa is actually safer than other parsers, because the language he parses is precisely specified in the BNF. With regular expressions, PEG, etc. It is very difficult to determine which language is actually understood. In practice, programmers usually get a few test cases and then give up.

In particular, parsing unwanted inputs can be a serious security issue - with traditional parsers, you usually don't know everything that you allow. Rarely checks a set of tests to see if input is really accepted, which should be an error. Marpa accurately understands the language in its specification - no less and nothing more.

So why is the scary language about taint mode? Marpa, in its most general case, can be considered a programming language and has exactly the same security issues. Allowing the user to execute arbitrary code is by definition unsafe, and this is exactly what C, Perl, Marpa, etc. do. By design. You cannot give an unreliable user a common interface language. That would be clear to C, Python, etc., but I thought someone might miss this in the case of Marpa. Therefore, the language of fear.

Marpa is IMHO more secure than competing technology. However, in the most general case, this is not safe enough.

+6
source share

taint mode is an optional perl option that says - treat user input as untrustworthy. This stops you using any "corrupted" variables, such as those read directly from STDIN or ENV in certain functions, because it is dangerous.

A typical example of using exploits for code injection: Mom's exploitation

What does taint mode do? It forces sanitation to run before using untrusted input in a risky way.

untainting is simple - all you need to do is apply a regular expression filter to the source data so that any “dangerous” metacharacters are Excluded. (It should be noted that perl does not actually know what is "dangerous" and what is not - it is assumed that you are not an idiot and simply "coordinate" everything)

This will result in an error:

 #!/usr/bin/env perl -T use strict; use warnings; my $tainted = $ENV{'USERNAME'}; system ( "echo $tainted" ); 

Because I pass an unreliable variable to the "system", and it may have inline code insertion.

Unsafe dependency in the system while working with the -T switch in

(He may also complain about the unsafe path)

So in order to free myself, I need to sanitize. Reasonable sanitation will be: the username should only be alphanumeric:

 #!/usr/bin/env perl -T use strict; use warnings; $ENV{'PATH'} = '/bin'; # an untainted value my $tainted = $ENV{'USERNAME'}; my ( $untainted ) = $tainted =~ m/(\w+)/g; system ( "echo $untainted"); # no error now 

And since I used a regular expression - perl assumes that I did not do something unscrupulous (e.g., (.*) ) And thus takes into account data that is not used.

Why is it important? Well, it depends on what your parser does. It is not uncommon for parsers - by nature - to get "broken" by invalid input. See Above, for example, where escaping some embedded SQL bypasses validation.

In your particular case:

  • Taint mode is optional. You should use it when you receive untrusted input (for example, from potentially malicious users), but this may be more of a problem than it costs for your own use.

  • HTML filtering to check length and character set is probably reasonable. For example, by checking this "ascii-compatible character encoding" .

In principle, although I think that you are too thinking about what a scuff check is - this is not an exhaustive check method - this is a security system. All he does is make sure that you have done some basic sanitation before missing user input into an unsafe mechanism. This is to stop funny gotchas, like the one I drew - most of them can be caught with a simple regular expression.

If you know about the problem and don’t worry about malicious user entries, then I don’t think you need to worry too much. There will be enough white character, and then make out.

+1
source share

All Articles