Get a list of processes in Windows using a secure character set

This post provides a solution for getting a list of running processes under Windows. Essentially, it does:

String cmd = System.getenv("windir") + "\\system32\\" + "tasklist.exe"; Process p = Runtime.getRuntime().exec(cmd); InputStreamReader isr = new InputStreamReader(p.getInputStream()); BufferedReader input = new BufferedReader(isr); 

then reads the input.

It looks and works fine, but I was wondering if there is a chance that the encoding used by the task list might not be the default encoding, and that this call might fail?

For example, this other question about another executable shows that this may cause some problems.

If so, is there a way to determine what the appropriate encoding should be?

+7
source share
4 answers

It can be divided into two parts:

  • Part of the windows
    From java, you execute the Windows command - externally for jvm in "Windows land". When the java Runtime class executes the windows command, it uses the DLL for the consoles and therefore appears in the windows as if the command was being executed in the console
    Q: When I run C: \ windows \ system32 \ tasklist.exe in the console, what is the character encoding ("code page" in Windows terminology)?

    • the chcp command with no argument gives the number of the active code page for the console (for example, 850 for Multilingual-Latin-1, 1252 for Latin-1). See Microsoft Windows Code Pages, Windows OEM Code Pages , Windows ISO Code Pages
      The system code page by default is initially configured according to your system language (enter systeminfo to see this or Control Panel โ†’ Region and Language).
    • the Windows OS / .NET getACP () function also provides this information.

  • Java part:
    How to decode java byte stream from Windows "x" codepage (for example, 850 or 1252)?

    • A complete mapping between Windows codepage numbers and equivalent java character names can be obtained from here - Codepage Identifiers (Windows)
    • However, in practice, one of the following prefixes can be added to achieve a match:
      "(none) for ISO," IBM "or" x-IBM "for OEM," windows- "or" x-windows- "for Microsoft / Windows.
      For example. ISO-8859-1 or IBM850 or windows-1252

Complete solution:

  String cmd = System.getenv("windir") + "\\system32\\" + "chcp.com"; Process p = Runtime.getRuntime().exec(cmd); // Use default charset here - only want digits which are "core UTF8/UTF16"; // ignore text preceding ":" String windowsCodePage = new Scanner( new InputStreamReader(p.getInputStream())).skip(".*:").next(); Charset charset = null; String[] charsetPrefixes = new String[] {"","windows-","x-windows-","IBM","x-IBM"}; for (String charsetPrefix : charsetPrefixes) { try { charset = Charset.forName(charsetPrefix+windowsCodePage); break; } catch (Throwable t) { } } // If no match found, use default charset if (charset == null) charset = Charset.defaultCharset(); cmd = System.getenv("windir") + "\\system32\\" + "tasklist.exe"; p = Runtime.getRuntime().exec(cmd); InputStreamReader isr = new InputStreamReader(p.getInputStream(), charset); BufferedReader input = new BufferedReader(isr); // Debugging output System.out.println("matched codepage "+windowsCodePage+" to charset name:"+ charset.name()+" displayName:"+charset.displayName()); String line; while ((line = input.readLine()) != null) { System.out.println(line); } 

Thanks for the Q! - It was fun.

+11
source

Actually, the encoding used by tasklist is always different from the system default.

On the other hand, it is safe to use the default value while output is limited to ASCII . Typically, executables only have ASCII characters in their names.

So, in order to get the correct lines, you need to convert (ANSI) the Windows codepage to the OEM code page and pass the latter as a charset to the InputStreamReader .

There seems to be no complete comparison between these encodings. You can use the following mapping:

 Map<String, String> ansi2oem = new HashMap<String, String>(); ansi2oem.put("windows-1250", "IBM852"); ansi2oem.put("windows-1251", "IBM866"); ansi2oem.put("windows-1252", "IBM850"); ansi2oem.put("windows-1253", "IBM869"); Charset charset = Charset.defaultCharset(); String streamCharset = ansi2oem.get(charset.name()); if (streamCharset) { streamCharset = charset.name(); } InputStreamReader isr = new InputStreamReader(p.getInputStream(), streamCharset); 

This approach worked for me with windows-1251 and IBM866 .

To get the current OEM encoding used by Windows, you can use the GetOEMCP function. The return value depends on the Language parameter for non-Unicode programs on the Administrative tab in the Region and language panel. A reboot is required to make changes.


There are two types of encodings on Windows: ANSI and OEM .

The first is used by non-Unicode applications running in GUI mode. The latter is used by console applications. Console applications cannot display characters that cannot be represented in the current OEM encoding.

Since tasklist is an application in console mode, its output is always in the current OEM encoding.

For English steam systems, usually Windows-1252 and CP850 .

As in Russia, my system has the following encodings: Windows-1251 and CP866 .
If I write the output of tasklist to a file, the file will not be able to correctly display Cyrillic characters:

I get ะัžา instead of (Hi!) When browsing in Notepad.
And ยตTorrent displayed as Torrent .

You cannot change the encoding used by tasklist .


However, it is possible to change the output encoding of cmd . If you give him /u , it will output everything in UTF-16 encoding.

 cmd /c echo Hi>echo.txt 

The size of echo.txt is 4 bytes: two bytes for Hi and two bytes for a new line ( \r and \n ).

 cmd /u /c echo Hi>echo.txt 

Now the size of echo.txt is 8 bytes: each character is represented by two bytes.

+5
source

Why not use the Windows API through JNA instead of spawning processes? Like this:

 import com.sun.jna.platform.win32.Kernel32; import com.sun.jna.platform.win32.Tlhelp32; import com.sun.jna.platform.win32.WinDef; import com.sun.jna.platform.win32.WinNT; import com.sun.jna.win32.W32APIOptions; import com.sun.jna.Native; public class ListProcesses { public static void main(String[] args) { Kernel32 kernel32 = (Kernel32) Native.loadLibrary(Kernel32.class, W32APIOptions.UNICODE_OPTIONS); Tlhelp32.PROCESSENTRY32.ByReference processEntry = new Tlhelp32.PROCESSENTRY32.ByReference(); WinNT.HANDLE snapshot = kernel32.CreateToolhelp32Snapshot(Tlhelp32.TH32CS_SNAPPROCESS, new WinDef.DWORD(0)); try { while (kernel32.Process32Next(snapshot, processEntry)) { System.out.println(processEntry.th32ProcessID + "\t" + Native.toString(processEntry.szExeFile)); } } finally { kernel32.CloseHandle(snapshot); } } } 

I posted a similar answer elsewhere .

+3
source

There is a much better way to check running processes or even run an OS command through java: Process and ProcessBuilder .

As for Charset, you can always find out the OS about supported encodings and get Encoder or Decoder according to your needs.

[Edit] Let me break it; There is no way to find out in which byte encoding of a given String, so your only choice is to get these bytes, to shift the order if necessary (if you are ever in an environment where the process can give you an array of bytes in different orders, use ByteBuffer to solve this problem) and use several CharsetDecoders supported to decode bytes to reasonable output.

This is too much, and you need to evaluate that this output can be in UTF-8, UTF-16 or any other encoding. But at least you can decode this output using one of the possible Charsets, and then try to use the processed output for your needs.

Since we are talking about a process running on the same OS that the JVM runs in, it is possible that your output will be in one of the Charset encodings returned by the availableCharsets () method.

0
source

All Articles