summaryrefslogtreecommitdiff
path: root/docs/handbook/json4cpp/binary-formats.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/handbook/json4cpp/binary-formats.md')
-rw-r--r--docs/handbook/json4cpp/binary-formats.md411
1 files changed, 411 insertions, 0 deletions
diff --git a/docs/handbook/json4cpp/binary-formats.md b/docs/handbook/json4cpp/binary-formats.md
new file mode 100644
index 0000000000..9cb9f666f2
--- /dev/null
+++ b/docs/handbook/json4cpp/binary-formats.md
@@ -0,0 +1,411 @@
+# json4cpp — Binary Formats
+
+## Overview
+
+The library supports five binary serialization formats in addition to JSON
+text. All are available as static methods on `basic_json`:
+
+| Format | To | From | RFC/Spec |
+|---|---|---|---|
+| CBOR | `to_cbor()` | `from_cbor()` | RFC 7049 |
+| MessagePack | `to_msgpack()` | `from_msgpack()` | MessagePack spec |
+| UBJSON | `to_ubjson()` | `from_ubjson()` | UBJSON spec |
+| BSON | `to_bson()` | `from_bson()` | BSON spec |
+| BJData | `to_bjdata()` | `from_bjdata()` | BJData spec |
+
+Binary serialization is useful for:
+- Smaller payload sizes
+- Faster parsing
+- Native binary data support
+- Type-rich encodings (timestamps, binary subtypes)
+
+## CBOR (Concise Binary Object Representation)
+
+### Serialization
+
+```cpp
+// To vector<uint8_t>
+static std::vector<std::uint8_t> to_cbor(const basic_json& j);
+
+// To output adapter (stream, string, vector)
+static void to_cbor(const basic_json& j, detail::output_adapter<uint8_t> o);
+static void to_cbor(const basic_json& j, detail::output_adapter<char> o);
+```
+
+```cpp
+json j = {{"compact", true}, {"schema", 0}};
+
+// Serialize to byte vector
+auto cbor = json::to_cbor(j);
+
+// Serialize to stream
+std::ofstream out("data.cbor", std::ios::binary);
+json::to_cbor(j, out);
+```
+
+### Deserialization
+
+```cpp
+template<typename InputType>
+static basic_json from_cbor(InputType&& i,
+ const bool strict = true,
+ const bool allow_exceptions = true,
+ const cbor_tag_handler_t tag_handler = cbor_tag_handler_t::error);
+```
+
+Parameters:
+- `strict` — if `true`, requires that all bytes are consumed
+- `allow_exceptions` — if `false`, returns discarded value on error
+- `tag_handler` — how to handle CBOR tags
+
+```cpp
+auto j = json::from_cbor(cbor);
+```
+
+### CBOR Tag Handling
+
+```cpp
+enum class cbor_tag_handler_t
+{
+ error, ///< throw parse_error on any tag
+ ignore, ///< ignore tags
+ store ///< store tags as binary subtype
+};
+```
+
+```cpp
+// Ignore CBOR tags
+auto j = json::from_cbor(data, true, true, json::cbor_tag_handler_t::ignore);
+
+// Store CBOR tags as subtypes in binary values
+auto j = json::from_cbor(data, true, true, json::cbor_tag_handler_t::store);
+```
+
+### CBOR Type Mapping
+
+| JSON Type | CBOR Type |
+|---|---|
+| null | null (0xF6) |
+| boolean | true/false (0xF5/0xF4) |
+| number_integer | negative/unsigned integer |
+| number_unsigned | unsigned integer |
+| number_float | IEEE 754 double (0xFB) or half-precision (0xF9) |
+| string | text string (major type 3) |
+| array | array (major type 4) |
+| object | map (major type 5) |
+| binary | byte string (major type 2) |
+
+## MessagePack
+
+### Serialization
+
+```cpp
+static std::vector<std::uint8_t> to_msgpack(const basic_json& j);
+static void to_msgpack(const basic_json& j, detail::output_adapter<uint8_t> o);
+static void to_msgpack(const basic_json& j, detail::output_adapter<char> o);
+```
+
+```cpp
+json j = {{"array", {1, 2, 3}}, {"null", nullptr}};
+auto msgpack = json::to_msgpack(j);
+```
+
+### Deserialization
+
+```cpp
+template<typename InputType>
+static basic_json from_msgpack(InputType&& i,
+ const bool strict = true,
+ const bool allow_exceptions = true);
+```
+
+```cpp
+auto j = json::from_msgpack(msgpack);
+```
+
+### MessagePack Type Mapping
+
+| JSON Type | MessagePack Type |
+|---|---|
+| null | nil (0xC0) |
+| boolean | true/false (0xC3/0xC2) |
+| number_integer | int 8/16/32/64 or negative fixint |
+| number_unsigned | uint 8/16/32/64 or positive fixint |
+| number_float | float 32 or float 64 |
+| string | fixstr / str 8/16/32 |
+| array | fixarray / array 16/32 |
+| object | fixmap / map 16/32 |
+| binary | bin 8/16/32 |
+| binary with subtype | ext 8/16/32 / fixext 1/2/4/8/16 |
+
+The library chooses the **smallest** encoding that fits the value.
+
+### Ext Types
+
+MessagePack extension types carry a type byte. The library maps this to the
+binary subtype:
+
+```cpp
+json j = json::binary({0x01, 0x02, 0x03}, 42); // subtype 42
+auto mp = json::to_msgpack(j);
+// Encoded as ext with type byte 42
+
+auto j2 = json::from_msgpack(mp);
+assert(j2.get_binary().subtype() == 42);
+```
+
+## UBJSON (Universal Binary JSON)
+
+### Serialization
+
+```cpp
+static std::vector<std::uint8_t> to_ubjson(const basic_json& j,
+ const bool use_size = false,
+ const bool use_type = false);
+```
+
+Parameters:
+- `use_size` — write container size markers (enables optimized containers)
+- `use_type` — write type markers for homogeneous containers (requires `use_size`)
+
+```cpp
+json j = {1, 2, 3, 4, 5};
+
+// Without optimization
+auto ub1 = json::to_ubjson(j);
+
+// With size optimization
+auto ub2 = json::to_ubjson(j, true);
+
+// With size+type optimization (smallest for homogeneous arrays)
+auto ub3 = json::to_ubjson(j, true, true);
+```
+
+### Deserialization
+
+```cpp
+template<typename InputType>
+static basic_json from_ubjson(InputType&& i,
+ const bool strict = true,
+ const bool allow_exceptions = true);
+```
+
+### UBJSON Type Markers
+
+| Marker | Type |
+|---|---|
+| `Z` | null |
+| `T` / `F` | true / false |
+| `i` | int8 |
+| `U` | uint8 |
+| `I` | int16 |
+| `l` | int32 |
+| `L` | int64 |
+| `d` | float32 |
+| `D` | float64 |
+| `C` | char |
+| `S` | string |
+| `[` / `]` | array begin / end |
+| `{` / `}` | object begin / end |
+| `H` | high-precision number (string representation) |
+
+## BSON (Binary JSON)
+
+### Serialization
+
+```cpp
+static std::vector<std::uint8_t> to_bson(const basic_json& j);
+static void to_bson(const basic_json& j, detail::output_adapter<uint8_t> o);
+static void to_bson(const basic_json& j, detail::output_adapter<char> o);
+```
+
+**Important:** BSON requires the top-level value to be an **object**:
+
+```cpp
+json j = {{"key", "value"}, {"num", 42}};
+auto bson = json::to_bson(j);
+
+// json j = {1, 2, 3};
+// json::to_bson(j); // throws type_error::317 — not an object
+```
+
+### Deserialization
+
+```cpp
+template<typename InputType>
+static basic_json from_bson(InputType&& i,
+ const bool strict = true,
+ const bool allow_exceptions = true);
+```
+
+### BSON Type Mapping
+
+| JSON Type | BSON Type |
+|---|---|
+| null | 0x0A (Null) |
+| boolean | 0x08 (Boolean) |
+| number_integer | 0x10 (int32) or 0x12 (int64) |
+| number_unsigned | 0x10 or 0x12 (depends on value) |
+| number_float | 0x01 (double) |
+| string | 0x02 (String) |
+| array | 0x04 (Array) — encoded as object with "0", "1", ... keys |
+| object | 0x03 (Document) |
+| binary | 0x05 (Binary) |
+
+### BSON Binary Subtypes
+
+```cpp
+json j;
+j["data"] = json::binary({0x01, 0x02}, 0x80); // subtype 0x80
+auto bson = json::to_bson(j);
+// Binary encoded with subtype byte 0x80
+```
+
+## BJData (Binary JData)
+
+BJData extends UBJSON with additional types for N-dimensional arrays and
+optimized integer types.
+
+### Serialization
+
+```cpp
+static std::vector<std::uint8_t> to_bjdata(const basic_json& j,
+ const bool use_size = false,
+ const bool use_type = false);
+```
+
+### Deserialization
+
+```cpp
+template<typename InputType>
+static basic_json from_bjdata(InputType&& i,
+ const bool strict = true,
+ const bool allow_exceptions = true);
+```
+
+### Additional BJData Types
+
+Beyond UBJSON types, BJData adds:
+
+| Marker | Type |
+|---|---|
+| `u` | uint16 |
+| `m` | uint32 |
+| `M` | uint64 |
+| `h` | float16 (half-precision) |
+
+## Roundtrip Between Formats
+
+Binary formats can preserve the same data as JSON text, but with some
+differences:
+
+```cpp
+json original = {
+ {"name", "test"},
+ {"values", {1, 2, 3}},
+ {"data", json::binary({0xFF, 0xFE})}
+};
+
+// JSON text cannot represent binary
+std::string text = original.dump();
+// "data" field would cause issues in text form
+
+// Binary formats can represent binary natively
+auto cbor = json::to_cbor(original);
+auto restored = json::from_cbor(cbor);
+assert(original == restored); // exact roundtrip
+
+// Cross-format conversion
+auto mp = json::to_msgpack(original);
+auto from_mp = json::from_msgpack(mp);
+assert(original == from_mp);
+```
+
+## Size Comparison
+
+Typical size savings over JSON text:
+
+| Data | JSON | CBOR | MessagePack | UBJSON | BSON |
+|---|---|---|---|---|---|
+| `{"a":1}` | 7 bytes | 5 bytes | 4 bytes | 6 bytes | 18 bytes |
+| `[1,2,3]` | 7 bytes | 4 bytes | 4 bytes | 6 bytes | N/A (top-level array) |
+| `true` | 4 bytes | 1 byte | 1 byte | 1 byte | N/A (top-level bool) |
+
+BSON has the most overhead due to its document-structure requirements.
+MessagePack and CBOR are generally the most compact.
+
+## Stream-Based Serialization
+
+All binary formats support streaming to/from `std::ostream` / `std::istream`:
+
+```cpp
+// Write CBOR to file
+std::ofstream out("data.cbor", std::ios::binary);
+json::to_cbor(j, out);
+
+// Read CBOR from file
+std::ifstream in("data.cbor", std::ios::binary);
+json j = json::from_cbor(in);
+```
+
+## Strict vs. Non-Strict Parsing
+
+All `from_*` functions accept a `strict` parameter:
+
+- `strict = true` (default): all input bytes must be consumed. Extra
+ trailing data causes a parse error.
+- `strict = false`: parsing stops after the first valid value. Remaining
+ input is ignored.
+
+```cpp
+std::vector<uint8_t> data = /* two CBOR values concatenated */;
+
+// Strict: fails because of trailing data
+// json::from_cbor(data, true);
+
+// Non-strict: parses only the first value
+json j = json::from_cbor(data, false);
+```
+
+## Binary Reader / Writer Architecture
+
+The binary I/O is implemented by two internal classes:
+
+### `binary_reader`
+
+Located in `include/nlohmann/detail/input/binary_reader.hpp`:
+
+```cpp
+template<typename BasicJsonType, typename InputAdapterType, typename SAX>
+class binary_reader
+{
+ bool parse_cbor_internal(bool get_char, int tag_handler);
+ bool parse_msgpack_internal();
+ bool parse_ubjson_internal(bool get_char = true);
+ bool parse_bson_internal();
+ bool parse_bjdata_internal();
+ // ...
+};
+```
+
+Uses the SAX interface internally — each decoded value is reported to a
+SAX handler (typically `json_sax_dom_parser`) which builds the JSON tree.
+
+### `binary_writer`
+
+Located in `include/nlohmann/detail/output/binary_writer.hpp`:
+
+```cpp
+template<typename BasicJsonType, typename CharType>
+class binary_writer
+{
+ void write_cbor(const BasicJsonType& j);
+ void write_msgpack(const BasicJsonType& j);
+ void write_ubjson(const BasicJsonType& j, ...);
+ void write_bson(const BasicJsonType& j);
+ void write_bjdata(const BasicJsonType& j, ...);
+ // ...
+};
+```
+
+Directly writes encoded bytes to an `output_adapter`.