Should I fix this obscure but elegant C ++ code snippet?

I stumbled upon this elegant piece of code, which, however, relies on an obscure but fundamental low-level characteristic:

std::string file_path_leaf( std::string const & path ) { auto const pos = path.find_last_of("/\\"); // windows or posix return path.substr( pos + 1 ); // a walk on the wild side? } 

In extreme cases (where "find_last_of" fails), it works correctly, i.e. leaves the string alone. But is it too unclear?

+7
c ++ language-lawyer
source share
4 answers

According to http://en.cppreference.com/w/cpp/string/basic_string/npos npos actually ...a special value equal to the maximum value representable by the type size_type . Cross-checked with the C ++ 11 standard, you can find this in 21.4 / 5, a class definition.

Then also unsigned arithmetic is equally guaranteed by the standard for wrapping, so the code is well defined.

However, you can still clarify its change. At the very least, add a comment on how it works for future maintainers.

+7
source share

TL; DR: The code is definitely safe, but you can make it bulletproof by changing pos + 1 to pos + 1U . This will avoid the unclear angle of the whole promotion rules, which can only be caused by a perverted (but legal) implementation.

Thanks to @TC for a helpful discussion that helped me remove a number of non-local sidetracks from this answer.

  • basic_string::find_last_of has a return type of size_type ([string.find.last.of] 24.3.2.7.5 / p1), which is an unsigned integer type ([allocator.requirements] 20.5.3.5/Table 31). (Actually, it's almost certainly std::size_t , since std::string uses the default size_type whose member size_type is of type std::size_t , according to [des.default.allocator] D.10 / p1.)

  • The value returned by basic_string::find_last_of if the object to be found was not found, basic_string::npos ([string.find.last.of] 24.3.2.7.5 / p2), which is equal to size_type(-1) ([basic .string] 24.3.2 / p5.1).

  • It is very likely (but not guaranteed) that size_type will have an integer rank greater than or equal to int . If this is the case, then when calculating pos + 1 1 will be converted to size_type until the addition is complete, and then the addition will be between unsigned integer values. The rules for unsigned integer arithmetic ensure that the result of adding unsigned 1 to unsigned -1 must be 0.

  • If size_type rank is less than int rank, then the result will depend on the relationship between numeric_limits::max<size_type>() (which is exactly the value of npos ) and numeric_limits::max<int>() :

    • If numeric_limits::max<size_type>() > numeric_limits::max<int>() (which means that size_type and unsigned int are the same size, even if they are different types), then pos will "advance" to unsigned int and then 1 will also be converted to unsigned int . The result will be the same as above.

    • If numeric_limits::max<size_type>() < numeric_limits::max<int>() , then pos will advance to the (signed) int , and the add will be signed. But, given the inequality, pos + 1 cannot overflow. The result may be too large to fit in size_type , of course, but the conversion from signed to unsigned integer types is well defined, and if pos == npos , the result is guaranteed to be 0.

    • This leaves only the case when numeric_limits::max<size_type>() == numeric_limits::max<int>() . One theoretical possibility for this equality would be that size_type exactly one bit shorter than int . Such an implementation would be at least strange, but it is still legal. In this case, pos will be translated into (signed) int , 1 will remain signed int , and if pos == npos , the signed addition pos + 1 will overflow, creating an Undefined Behavior.

  • To summarize, computing is unsafe if size_type has a lower rank than int but has the same maximum value. It's hard for me to believe that such a situation will ever arise in real life, but I believe that it will not violate the C ++ standard.

    To protect yourself from the above extremely unlikely event, just change the expression to pos + 1U . This ensures that pos converted (if necessary) to unsigned int to match 1U , after which the addition will be performed as unsigned arithmetic without the possibility of UB, and the result with pos == npos (after truncation to size_type ) will be 0.

+2
source share

JoeMo:

I thank you for your thoughtful and well-researched answers to my question.

I really changed it (according to the lines suggested by milleniumbug: (if (pos == std :: string :: npos) return path, etc.). In addition, I agree with it that the code looks wrong.

The reason the code looks wrong also explains why it looks "elegant"; one pass through the code handles all different cases. There are no statements, etc., There are only two direct statements. It can't be that easy, can it?

But he performs this feat of simplification, using specialized knowledge at different levels of abstraction (knowledge of bit-wize values โ€‹โ€‹of string :: npos, the way the substrate works in edges, etc.).

It's best to keep things at the same level of abstraction, even at the cost of a little extra code.

Thanks.

+1
source share

Regardless of whether this code demonstrates the correct behavior or not, it is still broken or at least rather fragile, as it relies on some bold implicit assumptions, rather than making these assumptions explicit and verified. It can be improved without changing the original logic by adding a couple of statements:

 std::string file_path_leaf( std::string const & path ) { auto const pos{path.find_last_of("/\\")}; // windows or posix assert((std::string::npos == pos) || ((std::string::size_type{0} <= pos) && (pos < path.size()))); static_assert(std::string::size_type{0} == (std::string::npos + std::string::size_type{1})); return path.substr(pos + std::string::size_type{1}); } 
-3
source share

All Articles