I am trying to handle the file names that my users upload. I want to support all valid UTF-8 characters except those that might be a problem for displaying HTML on a web page, accessing the CLI, or storing and retrieving in the file system.
In any case, I came up with the following indulgent function, and I wonder if this is enough to use. I use prepared instructions for all database queries, and I always encode HTML code, but I still like to know that this is also a thoughtful approach.
// $filename = $_FILES['file']['name']; $filename = 'Filename 123;".\'"."la\l[a]*(/.jpg ∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β), ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B), 2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm sfajs,-=[];\',./09μετράει าวนั้นเป็นชน Καλημέρα κόσμε, コンニチハ ()_+{}|":?><'; // Replace symbols, punctuation, and ASCII control characters like \n or [BEL] $filename = preg_replace('~[\p{S}\p{P}\p{C}]+~u', ' ', $filename);
Is this approach safe for me and suitable for my users?
Update
To clarify, I am not using the file name for the file name in the file system. I generate a unique hash and use this - I just need to keep the original name for users, because this is how they recognize their files. A SHA1 or UUID hash does not mean anything to them.
source share