Regex for UTF-8 Valid File Names

I am trying to handle the file names that my users upload. I want to support all valid UTF-8 characters except those that might be a problem for displaying HTML on a web page, accessing the CLI, or storing and retrieving in the file system.

In any case, I came up with the following indulgent function, and I wonder if this is enough to use. I use prepared instructions for all database queries, and I always encode HTML code, but I still like to know that this is also a thoughtful approach.

// $filename = $_FILES['file']['name']; $filename = 'Filename 123;".\'"."la\l[a]*(/.jpg ∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β), ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B), 2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm sfajs,-=[];\',./09μετράει าวนั้นเป็นชน Καλημέρα κόσμε, コンニチハ ()_+{}|":?><'; // Replace symbols, punctuation, and ASCII control characters like \n or [BEL] $filename = preg_replace('~[\p{S}\p{P}\p{C}]+~u', ' ', $filename); 

Is this approach safe for me and suitable for my users?

Update

To clarify, I am not using the file name for the file name in the file system. I generate a unique hash and use this - I just need to keep the original name for users, because this is how they recognize their files. A SHA1 or UUID hash does not mean anything to them.

+6
source share
1 answer

The very first thing you need to do is check your entry UTF-8.

mb_internal_encoding and mb_check_encoding are your friends.

You use the blacklist when good security practice uses the whitelist allowed entry.

Edit after clarification :

You must be safe. Remember to also filter Lm and No if you do not want to call Zalgo .

+2
source

Source: https://habr.com/ru/post/922871/


All Articles