This is possible and easiest to do using OBJCOPY found in BINUTILS. You actually take the data file as binary input, and then output it to an object file format that may be associated with your program.
OBJCOPY will even produce a start and end character, as well as the size of the data area so you can reference them in your code. The basic idea is that you want to say that your input file is binary (even if it is text); that you will target the x86-64 object file; specify the name of the input file and the name of the output file.
Suppose we have an input file called myfile.txt with the contents:
the quick brown fox jumps over the lazy dog
Something like this would be the starting point:
objcopy --input binary \ --output elf64-x86-64 \ --binary-architecture i386:x86-64 \ myfile.txt myfile.o
If you want to generate 32-bit objects, you can use:
objcopy --input binary \ --output elf32-i386 \ --binary-architecture i386 \ myfile.txt myfile.o
The output will be an object file named myfile.o . If we looked at the object file headers using OBJDUMP and commands like objdump -x myfile.o , we would see something like this:
myfile.o: file format elf64-x86-64 myfile.o architecture: i386:x86-64, flags 0x00000010: HAS_SYMS start address 0x0000000000000000 Sections: Idx Name Size VMA LMA File off Algn 0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, DATA SYMBOL TABLE: 0000000000000000 ld .data 0000000000000000 .data 0000000000000000 g .data 0000000000000000 _binary_myfile_txt_start 000000000000002c g .data 0000000000000000 _binary_myfile_txt_end 000000000000002c g *ABS* 0000000000000000 _binary_myfile_txt_size
By default, it creates a .data section with the contents of the file and creates several characters that can be used to refer to data.
_binary_myfile_txt_start _binary_myfile_txt_end _binary_myfile_txt_size
This is actually the address of the start byte, the end byte, and the size of the data that was placed in the object from the myfile.txt file. OBJCOPY will contain characters on the input file name. myfile.txt distorted in myfile_txt and used to create characters.
One problem is that the .data section is created, which is read / write / data, as shown here:
Idx Name Size VMA LMA File off Algn 0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, DATA
You specifically request the .rodata section, which must also have the READONLY flag. You can use the --rename-section option to change .data to .rodata and specify the necessary flags. You can add this to the command line:
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA
Of course, if you want to call a section other than .rodata with the same flags as the read-only section, you can change the .rodata in the line above to the name you want to use for the section.
The final version of the command, which should generate the type of the required object:
objcopy --input binary \ --output elf64-x86-64 \ --binary-architecture i386:x86-64 \ --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \ myfile.txt myfile.o
Now that you have the object file, how can you use it in C code (as an example). The characters created are a bit unusual, and there is a reasonable explanation for the Dev Wiki OS :
A common problem is getting garbage data when trying to use the value defined in the script builder. This usually happens because they dereference a symbol. The character defined in the script builder (for example, _ebss = .;) is only a character, not a variable. If you access a character using extern uint32_t _ebss; and then try using _ebss, the code will try to read a 32-bit integer from the address specified by _ebss.
The solution to this question is to take the address of _ebss, either by using it as & _ebss, or by defining it as an unsafe array (extern char _ebss [];) and dropping it into an integer. (Array designation prevents accidental reading from _ebss since arrays must be explicitly dereferenced)
With this in mind, we could create this C file called main.c :
#include <stdint.h> #include <stdlib.h> #include <stdio.h> /* These are external references to the symbols created by OBJCOPY */ extern char _binary_myfile_txt_start[]; extern char _binary_myfile_txt_end[]; extern char _binary_myfile_txt_size[]; int main() { char *data_start = _binary_myfile_txt_start; char *data_end = _binary_myfile_txt_end; size_t data_size = (size_t)_binary_myfile_txt_size; /* Print out the pointers and size */ printf ("data_start %p\n", data_start); printf ("data_end %p\n", data_end); printf ("data_size %zu\n", data_size); /* Print out each byte until we reach the end */ while (data_start < data_end) printf ("%c", *data_start++); return 0; }
You can compile and link:
gcc -O3 main.c myfile.o
The result should look something like this:
data_start 0x4006a2 data_end 0x4006ce data_size 44 the quick brown fox jumps over the lazy dog
The NASM example is similar in nature to the C code. The following build program, called nmain.asm writes the same line to standard output using Linux x86-64 system calls :
bits 64 global _start extern _binary_myfile_txt_start extern _binary_myfile_txt_end extern _binary_myfile_txt_size section .text _start: mov eax, 1 ; SYS_Write system call mov edi, eax ; Standard output FD = 1 mov rsi, _binary_myfile_txt_start ; Address to start of string mov rdx, _binary_myfile_txt_size ; Length of string syscall xor edi, edi ; Return value = 0 mov eax, 60 ; SYS_Exit system call syscall
This can be compiled and related to:
nasm -f elf64 -o nmain.o nmain.asm gcc -m64 -nostdlib nmain.o myfile.o
The output should look like this:
the quick brown fox jumps over the lazy dog