What is the most standard Java way to store raw binary data with XML?

I need to store a huge amount of binary data in a file, but I want to also read / write the header of this file in XML format.

Yes, I could just store the binary data in some XML value and allow it to be serialized using base64 encoding. But that would not be spatially effective.

Is it possible to “mix” XML data and raw binary data in a more standard way?

I was thinking of two options:

  • Is there a way to do this using JAXB?

  • Or is there a way to take some existing XML data and add binary data to it so that the border is recognized?

  • Isn't that the concept I'm looking for somehow using / for SOAP?

  • Or is it used in the email standard? (Separation of binary attachments)

The scheme of what I am trying to achieve is:

[meta-info-about-boundary][XML-data][boundary][raw-binary-data] 

Thanks!

+6
java xml serialization xml-serialization jaxb
source share
4 answers

I followed the concept proposed by Blaise Doughan , but without marshaller bindings:

I pass an XmlAdapter converting a byte[] to a URI -reference and vice versa, while links point to individual files where raw data is stored. The XML file and all binaries are then placed in a zip file.

It is similar to the OpenOffice approach and the ODF format, which is actually a zip with multiple XML and binary files.

(In the code example, the actual binaries are not written and the zip code is not created.)

Bindings.java

 import java.net.*; import java.util.*; import javax.xml.bind.annotation.*; import javax.xml.bind.annotation.adapters.*; final class Bindings { static final String SCHEME = "storage"; static final Class<?>[] ALL_CLASSES = new Class<?>[]{ Root.class, RawRef.class }; static final class RawRepository extends XmlAdapter<URI, byte[]> { final SortedMap<String, byte[]> map = new TreeMap<>(); final String host; private int lastID = 0; RawRepository(String host) { this.host = host; } @Override public byte[] unmarshal(URI o) { if (!SCHEME.equals(o.getScheme())) { throw new Error("scheme is: " + o.getScheme() + ", while expected was: " + SCHEME); } else if (!host.equals(o.getHost())) { throw new Error("host is: " + o.getHost() + ", while expected was: " + host); } String key = o.getPath(); if (!map.containsKey(key)) { throw new Error("key not found: " + key); } byte[] ret = map.get(key); return Arrays.copyOf(ret, ret.length); } @Override public URI marshal(byte[] o) { ++lastID; String key = String.valueOf(lastID); map.put(key, Arrays.copyOf(o, o.length)); try { return new URI(SCHEME, host, "/" + key, null); } catch (URISyntaxException ex) { throw new Error(ex); } } } @XmlRootElement @XmlType static final class Root { @XmlElement final List<RawRef> element = new LinkedList<>(); } @XmlType static final class RawRef { @XmlJavaTypeAdapter(RawRepository.class) @XmlElement byte[] raw = null; } } 

Main.java

 import java.io.*; import javax.xml.bind.*; public class _Run { public static void main(String[] args) throws Exception { JAXBContext context = JAXBContext.newInstance(Bindings.ALL_CLASSES); Marshaller marshaller = context.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); Unmarshaller unmarshaller = context.createUnmarshaller(); Bindings.RawRepository adapter = new Bindings.RawRepository("myZipVFS"); marshaller.setAdapter(adapter); Bindings.RawRef ta1 = new Bindings.RawRef(); ta1.raw = "THIS IS A STRING".getBytes(); Bindings.RawRef ta2 = new Bindings.RawRef(); ta2.raw = "THIS IS AN OTHER STRING".getBytes(); Bindings.Root root = new Bindings.Root(); root.element.add(ta1); root.element.add(ta2); StringWriter out = new StringWriter(); marshaller.marshal(root, out); System.out.println(out.toString()); } } 

Exit

 <root> <element> <raw>storage://myZipVFS/1</raw> </element> <element> <raw>storage://myZipVFS/2</raw> </element> </root> 
+2
source share

You can use AttachmentMarshaller and AttachmentUnmarshaller for this. This is the bridge used by JAXB / JAX-WS to transfer binary content as attachments. You can use the same mechanism to do what you want.


PROOF OF CONCEPT

The following describes how this can be implemented. This should work with any JAXB impl (it works for me with EclipseLink JAXB (MOXy) and the reference implementation).

Message format

 [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] 

Root

This is an object with several byte [] properties.

 import javax.xml.bind.annotation.XmlRootElement; @XmlRootElement public class Root { private byte[] foo; private byte[] bar; public byte[] getFoo() { return foo; } public void setFoo(byte[] foo) { this.foo = foo; } public byte[] getBar() { return bar; } public void setBar(byte[] bar) { this.bar = bar; } } 

Demo

This class is used to demonstrate the use of MessageWriter and MessageReader:

 import java.io.FileInputStream; import java.io.FileOutputStream; import javax.xml.bind.JAXBContext; public class Demo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance(Root.class); Root root = new Root(); root.setFoo("HELLO WORLD".getBytes()); root.setBar("BAR".getBytes()); MessageWriter writer = new MessageWriter(jc); FileOutputStream outStream = new FileOutputStream("file.xml"); writer.write(root, outStream); outStream.close(); MessageReader reader = new MessageReader(jc); FileInputStream inStream = new FileInputStream("file.xml"); Root root2 = (Root) reader.read(inStream); inStream.close(); System.out.println(new String(root2.getFoo())); System.out.println(new String(root2.getBar())); } } 

MessageWriter

Responsible for recording the message in the desired format:

 import java.io.ByteArrayOutputStream; import java.io.ObjectOutputStream; import java.io.OutputStream; import java.util.ArrayList; import java.util.List; import javax.activation.DataHandler; import javax.xml.bind.JAXBContext; import javax.xml.bind.Marshaller; import javax.xml.bind.attachment.AttachmentMarshaller; public class MessageWriter { private JAXBContext jaxbContext; public MessageWriter(JAXBContext jaxbContext) { this.jaxbContext = jaxbContext; } /** * Write the message in the following format: * [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] */ public void write(Object object, OutputStream stream) { try { Marshaller marshaller = jaxbContext.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true); BinaryAttachmentMarshaller attachmentMarshaller = new BinaryAttachmentMarshaller(); marshaller.setAttachmentMarshaller(attachmentMarshaller); ByteArrayOutputStream xmlStream = new ByteArrayOutputStream(); marshaller.marshal(object, xmlStream); byte[] xml = xmlStream.toByteArray(); xmlStream.close(); ObjectOutputStream messageStream = new ObjectOutputStream(stream); messageStream.write(xml.length); //[xml_length] messageStream.write(xml); // [xml] for(Attachment attachment : attachmentMarshaller.getAttachments()) { messageStream.write(attachment.getLength()); // [attachX_length] messageStream.write(attachment.getData(), attachment.getOffset(), attachment.getLength()); // [attachX] } messageStream.flush(); } catch(Exception e) { throw new RuntimeException(e); } } private static class BinaryAttachmentMarshaller extends AttachmentMarshaller { private static final int THRESHOLD = 10; private List<Attachment> attachments = new ArrayList<Attachment>(); public List<Attachment> getAttachments() { return attachments; } @Override public String addMtomAttachment(DataHandler data, String elementNamespace, String elementLocalName) { return null; } @Override public String addMtomAttachment(byte[] data, int offset, int length, String mimeType, String elementNamespace, String elementLocalName) { if(data.length < THRESHOLD) { return null; } int id = attachments.size() + 1; attachments.add(new Attachment(data, offset, length)); return "cid:" + String.valueOf(id); } @Override public String addSwaRefAttachment(DataHandler data) { return null; } @Override public boolean isXOPPackage() { return true; } } public static class Attachment { private byte[] data; private int offset; private int length; public Attachment(byte[] data, int offset, int length) { this.data = data; this.offset = offset; this.length = length; } public byte[] getData() { return data; } public int getOffset() { return offset; } public int getLength() { return length; } } } 

MessageReader

Responsible for reading the message:

 import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.io.ObjectInputStream; import java.io.OutputStream; import java.util.HashMap; import java.util.Map; import javax.activation.DataHandler; import javax.activation.DataSource; import javax.xml.bind.JAXBContext; import javax.xml.bind.Unmarshaller; import javax.xml.bind.attachment.AttachmentUnmarshaller; public class MessageReader { private JAXBContext jaxbContext; public MessageReader(JAXBContext jaxbContext) { this.jaxbContext = jaxbContext; } /** * Read the message from the following format: * [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] */ public Object read(InputStream stream) { try { ObjectInputStream inputStream = new ObjectInputStream(stream); int xmlLength = inputStream.read(); // [xml_length] byte[] xmlIn = new byte[xmlLength]; inputStream.read(xmlIn); // [xml] BinaryAttachmentUnmarshaller attachmentUnmarshaller = new BinaryAttachmentUnmarshaller(); int id = 1; while(inputStream.available() > 0) { int length = inputStream.read(); // [attachX_length] byte[] data = new byte[length]; // [attachX] inputStream.read(data); attachmentUnmarshaller.getAttachments().put("cid:" + String.valueOf(id++), data); } Unmarshaller unmarshaller = jaxbContext.createUnmarshaller(); unmarshaller.setAttachmentUnmarshaller(attachmentUnmarshaller); ByteArrayInputStream byteInputStream = new ByteArrayInputStream(xmlIn); Object object = unmarshaller.unmarshal(byteInputStream); byteInputStream.close(); inputStream.close(); return object; } catch(Exception e) { throw new RuntimeException(e); } } private static class BinaryAttachmentUnmarshaller extends AttachmentUnmarshaller { private Map<String, byte[]> attachments = new HashMap<String, byte[]>(); public Map<String, byte[]> getAttachments() { return attachments; } @Override public DataHandler getAttachmentAsDataHandler(String cid) { byte[] bytes = attachments.get(cid); return new DataHandler(new ByteArrayDataSource(bytes)); } @Override public byte[] getAttachmentAsByteArray(String cid) { return attachments.get(cid); } @Override public boolean isXOPPackage() { return true; } } private static class ByteArrayDataSource implements DataSource { private byte[] bytes; public ByteArrayDataSource(byte[] bytes) { this.bytes = bytes; } public String getContentType() { return "application/octet-stream"; } public InputStream getInputStream() throws IOException { return new ByteArrayInputStream(bytes); } public String getName() { return null; } public OutputStream getOutputStream() throws IOException { return null; } } } 

Additional Information

+8
source share

This is not supported by JAXB because you do not want to serialize binary data into XML, but you can usually do this at a higher level using JAXB. The way I do this is that the web services (SOAP and REST) ​​use MIME multipart / mixed messages (check the multipart specification ). Originally designed for emails, it works great for sending xml with binary data, and most web services, such as an axis or jersey, support it in an almost transparent way.

Here is an example of sending an object in XML along with a binary file with a REST web service using Jersey with the jersey-multipart extension .

XML object

 @XmlRootElement public class Book { private String title; private String author; private int year; //getter and setters... } 

Client

 byte[] bin = some binary data... Book b = new Book(); b.setAuthor("John"); b.setTitle("wild stuff"); b.setYear(2012); MultiPart multiPart = new MultiPart(); multiPart.bodyPart(new BodyPart(b, MediaType.APPLICATION_XML_TYPE)); multiPart.bodyPart(new BodyPart(bin, MediaType.APPLICATION_OCTET_STREAM_TYPE)); response = service.path("rest").path("multipart"). type(MultiPartMediaTypes.MULTIPART_MIXED). post(ClientResponse.class, multiPart); 

Server

 @POST @Consumes(MultiPartMediaTypes.MULTIPART_MIXED) public Response post(MultiPart multiPart) { for(BodyPart part : multiPart.getBodyParts()) { System.out.println(part.getMediaType()); } return Response.status(Response.Status.ACCEPTED). entity("Attachements processed successfully."). type(MediaType.TEXT_PLAIN).build(); } 

I tried to send a file with 110917 bytes. Using wirehark, you can see that the data is transmitted directly through HTTP as follows:

 Hypertext Transfer Protocol POST /org.etics.test.rest.server/rest/multipart HTTP/1.1\r\n Content-Type: multipart/mixed; boundary=Boundary_1_353042220_1343207087422\r\n MIME-Version: 1.0\r\n User-Agent: Java/1.7.0_04\r\n Host: localhost:8080\r\n Accept: text/html, image/gif, image/jpeg\r\n Connection: keep-alive\r\n Content-Length: 111243\r\n \r\n [Full request URI: http://localhost:8080/org.etics.test.rest.server/rest/multipart] MIME Multipart Media Encapsulation, Type: multipart/mixed, Boundary: "Boundary_1_353042220_1343207087422" [Type: multipart/mixed] First boundary: --Boundary_1_353042220_1343207087422\r\n Encapsulated multipart part: (application/xml) Content-Type: application/xml\r\n\r\n eXtensible Markup Language <?xml <book> <author> John </author> <title> wild stuff </title> <year> 2012 </year> </book> Boundary: \r\n--Boundary_1_353042220_1343207087422\r\n Encapsulated multipart part: (application/octet-stream) Content-Type: application/octet-stream\r\n\r\n Media Type Media Type: application/octet-stream (110917 bytes) Last boundary: \r\n--Boundary_1_353042220_1343207087422--\r\n 

As you can see, binary data is sent with an octet stream without space loss, which contradicts what happens when sending inline binary data to xml. This is just a very low MIME envelope. With SOAP the principle is the same (just that it will have a SOAP envelope).

+2
source share

I don’t think so - XML ​​libraries are usually not designed to work with additional XML + data.

But you may be able to avoid something as simple as a special stream wrapper - it will expose a stream containing XML and a binary stream (from a special "format"). Then JAXB (or any other XML library) can play with the XML stream, and the binary stream is stored separately.

Also remember that you need to consider “binary” and “text” files.

Happy coding.

0
source share

All Articles