KeyForge Compression

22 Apr 2019

Just quick post, that someone might find useful/interesting, on how to get better compression by pre-processing.

I'm collecting KeyForge deck data, and it’s started to get quite big so I thought I’d see how well it would compress. Since I’m going to be doing a few experiments I grabbed 1000 entries at random to test with, this should be enough to show how well stuff works, without having to process the entire set.

The subset starts off at 3,267,834 bytes (3.2MB), after packing into a zip file without any compression. Turning on deflate, the oldest and best supported method, reduces this down to 1,196,874 bytes (1.1MB), or 36% of the original size.

Now we could stop here but most compression methods can benefit from a bit of pre-processing with knowledge about what you’re storing. So let’s look at a KeyForge deck:

{
    "name": "Quintis, Draconian Landing Strip Sensei",
    "expansion": 341,
    "power_level": 0,
    "chains": 0,
    "wins": 0,
    "losses": 0,
    "id": "a050d46e-dfbb-4697-9b73-76974c9c26c2",
    "is_my_deck": false,
    "cards": [
        "9b079369-608a-430a-9b08-9f2d6b32435b",
        "1691e035-1eab-41de-ad18-26245265e64f",
        "1691e035-1eab-41de-ad18-26245265e64f",
        "cc44ca9a-6994-4897-9308-ff332cc8de57",
        "aeed12e9-7b9d-43f4-8bf7-04f076c3ea79",
        "2d5666d0-5a93-4f75-b5f4-5085e4ee9b0f",
        "2d5666d0-5a93-4f75-b5f4-5085e4ee9b0f",
        "0fba4ba8-317b-44c0-a51e-0a06bdb770d3",
        "5fc44836-83fd-4a10-af61-9168db728cc0",
        "1a84631d-7fcb-4c9a-a50c-9539dcb84928",
        "1a84631d-7fcb-4c9a-a50c-9539dcb84928",
        "211c5213-7838-4292-b9c4-fb3a663898ee",
        "0186f94c-68df-4d5c-9338-9e918affe313",
        "78f28f49-8edb-4333-bd22-308a229f200f",
        "bbd788cd-1f8d-4950-a7c1-bc7fe5c0d49a",
        "15f1a6f4-873f-4fa9-a080-7f01e72bbff1",
        "afa69425-4fe4-4e5b-a016-7c142ed0a849",
        "551a951f-39cc-4f13-8070-c0758066769c",
        "ff290bf3-dd57-48b3-add3-c6baf605967c",
        "d988a134-ff29-40f7-bac7-3fd49fe525f8",
        "429e5d71-40bc-4ff4-ae81-4d5c0c10d15e",
        "11663693-8a10-4783-9f89-47f43c49bfa3",
        "03c4165e-a0bb-4fd5-b6a8-e3d9aec0551e",
        "03c4165e-a0bb-4fd5-b6a8-e3d9aec0551e",
        "98fca3bb-b74c-4563-876b-7c9e942e8254",
        "f96b20b3-df0e-4d43-a737-e7fa56ff690b",
        "f96b20b3-df0e-4d43-a737-e7fa56ff690b",
        "469dd68d-cdd6-40e0-8fc9-a167c45a9aea",
        "3f0f006a-10bc-4f1a-a90a-a64abb14d5a0",
        "f04a582c-c50b-453e-afc8-9d459c46cc22",
        "74d1da3a-9d90-43ea-8ead-f7968c4d562d",
        "dc6344a9-0486-4926-a820-d99eb2151c7f",
        "dc6344a9-0486-4926-a820-d99eb2151c7f",
        "740e0810-14e8-4278-bb44-e2e5c98184f9",
        "02feb5dd-81a0-4e06-8b5d-0ad7bdc9de08",
        "05fa104f-3719-41d0-9189-57ff3ec5edc1"
    ],
    "notes": [],
    "is_my_favorite": false,
    "is_on_my_watchlist": false,
    "casual_wins": 0,
    "casual_losses": 0,
    "_links": {
        "houses": [
            "Mars",
            "Shadows",
            "Untamed"
        ]
    }
}

As you can see, the cards section takes up most of the space, so this is a good candidate.

Now the current, and at the moment only, set has 370 cards in it but the IDs above are made to be unique across all sets. This is a little strange, as in KeyForge decks are pre-constructed and only contain cards from the same set, but I’m guessing this is to make their back-end and deck generation easier.

This means that the 36 byte UID is far larger then it needs to be.

Assuming there’s at most 512 cards in a set we can represent a card in just 9 bits, rather than the 288 used above.

There’s a problem with this however, in the form of Maverick cards. Each of the cards in the set has a house (think of it like a suit in a pack of playing cards, or the colour in MTG) but Mavericks are rare versions that are switched into a different house.

Here’s the standard version of the ‘Bait and Switch’ card:

{
    "amber": 0,
    "armor": 0,
    "cardType": "Action",
    "description": "Play: If your opponent has more <A> than you, steal 1<A>. Repeat this card's effect if your opponent still has more <A> than you.",
    "expansion": 341,
    "flavorText": "\u201cHeckuva deal.\u201d \u2013 Old Bruno",
    "house": "Shadows",
    "id": "0186f94c-68df-4d5c-9338-9e918affe313",
    "imageUrl": "https://cdn.keyforgegame.com/media/card_front/en/341_267_VHQ67J5MWQV5_en.png",
    "indexInExpansion": 267,
    "isMaverick": false,
    "name": "Bait and Switch",
    "powerLevel": 0,
    "rarity": 0,
    "traits": null
}

It has a card index of 267 and house of Shadows, and here’s a Maverick version:

{
    "amber": 0,
    "armor": 0,
    "cardType": "Action",
    "description": "Play: If your opponent has more <A> than you, steal 1<A>. Repeat this card's effect if your opponent still has more <A> than you.",
    "expansion": 341,
    "flavorText": "\u201cHeckuva deal.\u201d \u2013 Old Bruno",
    "house": "Logos",
    "id": "771c498c-33bc-40b7-8d41-54fd0a7c2567",
    "imageUrl": "https://cdn.keyforgegame.com/media/card_front/en/341_267_VHQ67J5MWQV5_en.png",
    "indexInExpansion": 267,
    "isMaverick": true,
    "name": "Bait and Switch",
    "powerLevel": 0,
    "rarity": 0,
    "traits": null
}

This has the same index but has ‘isMaverick’ set to true and is now in Logos.

The solution I came up with was to assume there’s only ever 512 cards in a set (2^9) and then to use 3 extra bits to store the Maverick state (e.g. 0 being non-Maverick and 1-7 meaning swapped house).

This lets us reduce that 36 byte UID into 12 bits, or 2 bytes to keep things simple, reducing the cards section down from 1,404 bytes (36 bytes for the UID, 3 for delimiters in JSON), to 72.

This would be fine, if you could just stick random binary data into JSON, but you can’t so we’ll need to Base64 encode it. After this you end up with the whole cards section reduced to 98 bytes (I’ve spread it over two lines here, with the \ added to do so):

"cards": "AAQABwALABAAEQAWABsAHgAfACAALgAxAKsArwCxALQAtADHAMoAzQDPAM8AzwDPANUA1QDWAN\
kA2gDaAOQA7gDvAPQBAgED",

If we want to go even further, we could pack 4 x 12-bit values into 3 x 16-bit ones, rather than wasting those top 4 bits. This reduces it to 74 bytes (after Base64 encoding):

"cards": "AAAEBwsQAAARFhseAAAfIC4xAACrr7G0AAC0x8rNAADPz8_PAADV1dbZAADa2uTuABHv9AID",

This would already save us a quite a bit of space in the uncompressed version, but what difference does it make when we compress it?

MethodSize in Bytes% of original% of compressed
Just Deflate1,196,87436.63%100%
Reduced 2 byte573,49717.55%47.91%
Reduce 1.5 byte557,79917.07%46.60%

This gives reduces the compressed data by just under 50% again, which is useful as the data-set I currently have is about 1.2GB. With that in mind, the savings should be:

MethodSize in MBMB saved
Just Deflate467808
Reduced 2 byte2231051
Reduce 1.5 byte2171057

Whether the extra packing of 4 bytes into 3 is worth the additional complexity I’m not sure, given that it only saves 6 MB, but it’s only a bit of bit shifting.

It does show you what a little tweaking can do to improve your compression rates.