================== Processing Modules ================== Cuckoo's processing modules are Python scripts that let you define custom ways for analyzing the raw results generated by the sandbox and append some information to a global container that will be later used by the reporting modules. You can create as many modules as you want, as long as they follow a predefined structure that we will present in this chapter. Global Container ================ After an analysis is completed, Cuckoo will invoke all the processing modules available in the *modules/processing/* directory. Every module will then be initialized and executed and the data returned will be appended in a data structure that we'll call **global container**. This container is simply just a big Python dictionary that contains all the abstracted results produced by all the modules sorted by their defined keys. Cuckoo is already provided with a default set of modules which will generate a *standard* global container. It's important for the existing reporting modules (HTML report etc.) that these default modules are not modified, otherwise the resulting global container structure would change and the reporting modules wouldn't be able to recognize it and extract the information used to build the final report. Following is a JSON-like representation of a default global container:: { "info": { "started": , "ended": , "duration": , "version": }, "signatures": [ { "severity": , "description": "alert": , "references": [], "data": [], "name": } ], "behavior": { "processes": [ { "parent_id": , "process_name": , "process_id": , "first_seen": , "calls": [ { "category": , "status": , "return": , "timestamp": , "repeated": , "api": , "arguments": [ { "name": , "value": } ] }, <...> ], <...> } ], "processtree": [ { "pid": , "name": , "children": [] } ], "summary": { "files": [], "keys": [], "mutexes": [] } }, "static": {}, "dropped": [ { "size": , "sha1": , "name": , "type": , "crc32": , "ssdeep": , "sha256": , "sha512": , "md5": }, <...> ], "file": { "size": , "sha1": , "name": , "type": , "crc32": , "ssdeep": , "sha256": , "sha512": , "md5": }, "debug": { "log": }, "network": { "http": [ { "body": , "uri": , "method": , "host": , "version": , "path": , "data": , "port": }, <...> ], "udp": [ { "dport": , "src": , "dst": , "sport": }, <...> ], "hosts": [], "dns": [ { "ip": , "hostname": }, ], "tcp": [ { "dport": , "src": , "dst": , "sport": }, <...> ] } } Every processing module added will end up with a dedicated dictionary entry in this data structure. Getting started =============== All processing modules are and should be placed in *modules/processing/*. In this directory you will find a set of default modules that are used to produce the traditional Cuckoo analysis reports. A basic processing module could look like: .. code-block:: python :linenos: from lib.cuckoo.common.abstracts import Processing class MyModule(Processing): def run(self): self.key = "file" data = do_something() return data Every processing module should contain: * A class inheriting ``Processing``. * A ``run()`` function. * A ``self.key`` attribute defining the name to be used as a subcontainer for the returned data. * A set of data (list, dictionary or string etc.) that will be appended to the global container. The processing modules are provided with some attributes that can be used to access the raw results for the given analysis: * ``self.analysis_path``: path to the folder containing the results (e.g. *storage/analysis/1*) * ``self.log_path``: path to the *analysis.log* file. * ``self.conf_path``: path to the *analysis.conf* file. * ``self.file_path``: path to the analyzed file. * ``self.dropped_path``: path to the folder containing the dropped files. * ``self.logs_path``: path to the folder containing the raw behavioral logs. * ``self.shots_path``: path to the folder containing the screenshots. * ``self.pcap_path``: path to the network pcap dump. Example ======= A good example to understand better the mechanics behind this is the Yara module. Yara is a tool and library used to match user's defined signatures containing static binary patterns against the analyzed file. .. code-block:: python :linenos: import os import logging try: import yara HAVE_YARA = True except ImportError: HAVE_YARA = False from lib.cuckoo.common.constants import CUCKOO_ROOT from lib.cuckoo.common.abstracts import Processing log = logging.getLogger(__name__) class YaraSignatures(Processing): """Yara signature processing.""" def run(self): """Run Yara processing. @return: hash with matches. """ self.key = "yara" matches = [] if HAVE_YARA: try: rules = yara.compile(filepath=os.path.join(CUCKOO_ROOT, "data", "yara", "index.yar")) for match in rules.match(self.file_path): matches.append({"name" : match.rule, "meta" : match.meta}) except yara.Error as e: log.warning("Unable to match Yara signatures: %s" % e[1]) else: log.warning("Yara is not installed, skip") return matches As you can see in line #22 we defined the key name for the module. Next in the ``run()`` function we compile the signatures file and match every signature against the file located at ``self.file_path``. The matched signatures are appended in the ``matches`` dictionary which is then returned and that will be included in the global container under the section "yara". .