Why FEXRD?

FFRI Dataset is a dataset that can be used for a wide variety of research by containing each tool's raw output (e.g., lief, TrID, and peHash). An example in the FFRI Dataset 2020 is shown below. It is formatted as JSON, and its keys correspond to the names of the tools, and its values correspond to the tool's raw outputs.

{
  "label": 0,
  "date": null,
  "version": "2020",
  "file_size": 12288,
  "hashes": {
    "md5": "c5560c9b347ac6355dd7020b9a841ffc",
    "sha1": "437f5ccdf1fefc4f42442ddc22f46e4c34f1ae9f",
    "sha256": "e215fbbdf2a9fec8161808a41d371228882202d63924a99d0efbdae54c4d8f23",
    "ssdeep": "192:k8xZxfjo/vNxzp/yChtHmNn9sAzdN+j7RIY+Oifwhy681sy3Q5tfqXU/YEm:fxro/HzpyagNntajN/+p31TEm",
    "imphash": "7d3ef9faa2be833b9d39423cd3ed8b07",
    "impfuzzy": "48:8/Tbnw/LnNV06EAjIj1fB+xBMLSQMftMS1o:8nnw/LNVxEAkjZsXvtMS1o",
    "tlsh": "C4422B47BF564CFBC66943748463074AE1B17E418733A3CF13A9912D1FA6781312AA9C",
    "totalhash": null,
    "anymaster": "69631d85bdfc28870624870bb6dfd9c3defe4612",
    "anymaster_v1_0_1": "c1b9817d0a3e3eec7b1c53ba03460e4abc3d8f8c",
    "endgame": "d97267dd40d12b532b38daa246886bf0",
    "crits": "ce4c40d39ef962b9681c2ffe3984ad7ef7311bcd",
    "pehashng": "05c0ba3fd00cbd6615ebd83c5d57b85840f12673934b761fafb9893f786a6419"
  },
  "lief": {
    "data_directories": [
      {
        "RVA": 0,
        "size": 0,
        "type": "EXPORT_TABLE"
      },
      {
        "RVA": 10580,
        "section": ".rdata",
      }
  ...
  }
  ...
}

For machine learning research, we need to convert the data into fixed-dimensional vectors. Since the JSON data is heavily nested, you might be annoyed by too high-dimensional vectors if you just flatten the data.

FEXRD makes it easy for you to obtain fixed-dimensional vectors suitable for machine learning research from the FFRI Dataset.