Basic Usage
Feature Extractor Class
FEXRD provides feature extraction classes to convert the elements in the FFRI Dataset into numpy.ndarray
vectors (hereinafter referred to simply as vectors). These classes are prepared for the values corresponding to the following keys in the FFRI Dataset.
- lief
- dos_header
- rich_header
- header
- optional_header
- data_directories
- sections
- relocations
- tls
- export
- debug (work in progress)
- imports
- resources_tree (work in progress)
- resources_manager
- signatures (signature for FFRI Dataset 2020)
- load_configuration
- peid
- trid
- strings
- die
- Manalyze
The feature extraction classes corresponding to each of the above keys are as follows.
- LiefFeatureExtractor: lief
- DosHeaderFeatureExtractor: dos_header
- RichHeaderFeatureExtractor: rich_header
- HeaderFeatureExtractor: header
- OptionalHeaderFeatureExtractor: optional_header
- DataDirectoriesFeatureExtractor: data_directories
- SectionsFeatureExtractor: sections
- RelocationsFeatureExtractor: relocations
- TlsFeatureExtractor: tls
- ExportFeatureExtractor: export
- DebugFeatureExtractor: debug
- ImportsFeatureExtractor: imports
- ResourcesTreeFeatureExtractor: resources_tree
- ResourcesManagerFeatureExtractor: resources_manager
- SignatureFeatureExtractor: signatures (signature for FFRI Dataset 2020)
- LoadConfigurationFeatureExtractor: load_configuration
- PeidFeatureExtractor: peid
- TridFeatureExtractor: trid
- StringsFeatureExtractor: strings
- DieFeatureExtractor: die
- Manalyze: manalyze_plugin_packer
In addition to the above feature extraction classes, we also provide AllFeaturesExtractor. This can be used to create a vector that combines all the above feature extraction classes' outputs.
Usage Example
Let's see how to use it in practice.
In FEXRD, depending on the feature you want to use, you can instantiate the corresponding feature extraction class and call the get_features
method to retrieve the output vector. An example of creating a vector of the "strings" element is as follows.
import json
from fexrd import StringsFeatureExtractor
sfe = StringsFeatureExtractor() # instantiae feature extractor class for the "string" element
fin = open("ffridataset_sample.jsonl", "r")
for l in fin.readlines():
obj = json.loads(l)
column_names, vector = sfe.get_features(obj["strings"]) # convert to the vector
In the above example, StringsFeatureExtractor
is instantiated, and the "string" element is passed as an argument to the get_features
method to get the vector.
The return value of the get_features
method is a tuple, where the 0th element is the column name of the vector and the 1st element is the vector.
The same is true for converting the element corresponding to the key other than "strings" into a vector.
Command Line Interface
FEXRD also provides a command-line interface for debugging purposes. Two commands are currently supported.
The command show-raw-dict
shows a raw output of a specified JSON element before vectorization.
The command show-vec
shows a feature vector of a specified JSON element.
$ python -m fexrd show-raw-dict --help
Usage: __main__.py show-raw-dict [OPTIONS] INPUT_JSON VER_STR EXTRACTOR_NAME
Arguments:
INPUT_JSON [required]
VER_STR [required]
EXTRACTOR_NAME Show an output of extract_raw_feature method. Available
feature names are: lief, dos_header, rich_header, header,
optional_header, data_directories, sections, relocations,
tls, export, debug, imports, resources_tree,
resources_manager, signatures, load_configuration, peid,
trid, strings, all, die, manalyze_plugin_packer [required]
$ python -m fexrd show-vec --help
Usage: __main__.py show-vec [OPTIONS] INPUT_JSON VER_STR EXTRACTOR_NAME
Arguments:
INPUT_JSON [required]
VER_STR [required]
EXTRACTOR_NAME Show an output of vectorize_features. Available feature
names are: lief, dos_header, rich_header, header,
optional_header, data_directories, sections, relocations,
tls, export, debug, imports, resources_tree,
resources_manager, signatures, load_configuration, peid,
trid, strings, all, die, manalyze_plugin_packer [required]
Options:
--help Show this message and exit.
Here, we show some usage examples.
$ python -m fexrd show-vec ./tests/test_regression_test/v2021/01340ff69f0c627a5f1cba2b82a59ef90b1a61ecb8078e183dc3e5d4abc847e9.json v2021 dos_header
dos_header_addressof_new_exeheader,dos_header_addressof_relocation_table,dos_header_checksum,dos_header_file_size_in_pages,dos_header_header_size_in_paragraphs,dos_header_initial_ip,dos_header_initial_relative_cs,dos_header_initial_relative_ss,dos_header_initial_sp,dos_header_magic,dos_header_maximum_extra_paragraphs,dos_header_minimum_extra_paragraphs,dos_header_numberof_relocation,dos_header_oem_id,dos_header_oem_info,dos_header_overlay_number,dos_header_reserved[0],dos_header_reserved[1],dos_header_reserved[2],dos_header_reserved[3],dos_header_reserved2[0],dos_header_reserved2[1],dos_header_reserved2[2],dos_header_reserved2[3],dos_header_reserved2[4],dos_header_reserved2[5],dos_header_reserved2[6],dos_header_reserved2[7],dos_header_reserved2[8],dos_header_reserved2[9],dos_header_used_bytes_in_the_last_page
16843152.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,23117.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0,37008.0
$ python -m fexrd show-raw-dict ./tests/test_regression_test/v2021/01340ff69f0c627a5f1cba2b82a59ef90b1a61ecb8078e183dc3e5d4abc
847e9.json v2021 dos_header
{
"addressof_new_exeheader": 16843152,
"addressof_relocation_table": 37008,
"checksum": 37008,
"file_size_in_pages": 37008,
"header_size_in_paragraphs": 37008,
"initial_ip": 37008,
"initial_relative_cs": 37008,
"initial_relative_ss": 37008,
"initial_sp": 37008,
"magic": 23117,
"maximum_extra_paragraphs": 37008,
"minimum_extra_paragraphs": 37008,
"numberof_relocation": 37008,
"oem_id": 37008,
"oem_info": 37008,
"overlay_number": 37008,
"reserved": [
37008,
37008,
37008,
37008
],
"reserved2": [
37008,
37008,
37008,
37008,
37008,
37008,
37008,
37008,
37008,
37008
],
"used_bytes_in_the_last_page": 37008
}
Additionally, you can use Docker for the CLI.
docker-compose -f .\docker-compose.production.yml run app python -m fexrd show-raw-dict --help
More Practical Usage Examples
The example directory contains more practical usage examples.