Skip to content

Constructing YARA files

Marek Milkovič edited this page Apr 17, 2019 · 2 revisions

Constructing YARA files

Constructing your own YARA file using C++ interface of yaramod is very straightforward. We will start from the lowest possible level and that is how to construct condition.

Condition

In order to construct a condition, you use yaramod::YaraExpressionBuilder. It provides you with several functions and methods that can help you out with constructing AST of a condition from the leaf expressions up to the root expression. These functions and methods are available:

Basic expression functions

These functions are basic building block for YARA expressions. You always want to start from these expressions and build upon them to form complex expressions. Each of these functions returns you an object of type YaraExpressionBuilder. Those functions with parameters also mostly accept object of these types, so whenever you are not sure what kind of expression to put there, just look at the list of all basic expressions and find the most suitable one.

  • filesize() - represents filesize keyword
  • entrypoint() - represents entrypoint keyword
  • all() - represents all keyword
  • any() - represents any keyword
  • them() - represents them keyword
  • intVal(val, [mult]) - represents signed integer with multiplier (default: IntMultiplier::None) (intVal(10), intVal(10, IntMultiplier::Kilobytes), intVal(10, IntMultiplier::Megabytes))
  • uintVal(val, [mult]) - represents unsigned integer with multiplier (default: IntMultiplier::None) (intVal(10), intVal(10, IntMultiplier::Kilobytes), intVal(10, IntMultiplier::Megabytes))
  • hexIntVal(val) - represents hexadecimal integer (hexIntVal(0x10))
  • doubleVal(val) - represents double floating-point value (doubleVal(3.14))
  • stringVal(str) - represents string literal (stringVal("Hello World!"))
  • boolVal(bool) - represents boolean literal (boolVal(true))
  • id(id) - represents single identifier with name id (id("pe"))
  • stringRef(ref) - represents reference to string identifier ref (stringRef("$1"))
  • set(elements) - represents (item1, item2, ...) (set({stringRef("$1"), stringRef("$2")}))
  • range(low, high) - represents (low .. high) (range(intVal(100), intVal(200)))
  • matchCount(ref) - represents match count of string identifier ref (matchCount("$1"))
  • matchLength(ref, [n]) - represent nth match (default: 0) length of string identifier ref (matchLength("$1", intVal(1)))
  • matchOffset(ref, [n]) - represents nth match (default: 0) offset of string identifier ref (matchOffset("$1", intVal(1)))
  • matchAt(ref, expr) - represents <ref> at <expr> (matchAt("$1", intVal(100)))
  • matchInRange(ref, range) - represents <ref> in <range> (matchInRange("$1", range(intVal(100), intVal(200))))
  • regexp(regexp, mods) - represents regular expression in form // (regexp("^a.*b$", "i")`)
  • forLoop(spec, var, set, body) - represents for loop over set of integers (forLoop(any(), "i", range(intVal(100), intVal(200)), matchAt("$1", id("i"))))
  • forLoop(spec, set, body) - represents for loop over set of string references (forLoop(any(), set({stringRef("$*")}), matchAt("$", intVal(100)))
  • of(spec, set) - represents <spec> of <set> (of(all(), them()))
  • paren(expr, [newline]) - represents parentheses around expressions and newline indicator for putting enclosed expression on its own line (paren(intVal(10)))
  • conjunction(terms, [newline]) - represents conjunction of terms and optionally puts them on each separate line if newline is set (conjunction({id("rule1"), id("rule2")}))
  • disjunction(terms, [newline]) - represents disjunction of terms and optionally puts them on each separate line if newline is set (disjunction({id("rule1"), id("rule2")}))
Complex expression methods

Class YaraExpressionBuilder provides you with multiple methods that can help you build complex expressions. The most of them are overloaded operators to make it easier and readable when building long condition. These methods are:

  • operator! - represents logical not (!boolVal(true))
  • operator~ - represents bitwise not (~hexIntVal(0x100))
  • operator- - represents unary operator - (-id("i"))
  • operator&& - represents logical and (id("rule1") && id("rule2"))
  • operator|| - represents logical or (id("rule1") || id("rule2"))
  • operator< - represents operator < (matchOffset("$1") < 100)
  • operator> - represents operator > (matchOffset("$1") > 100)
  • operator<= - represents operator <= (matchOffset("$1") <= 100)
  • operator>= - represents operator >= (matchOffset("$1") >= 100)
  • operator+ - represents operator + (matchOffset("$1") + intVal(100))
  • operator- - represents operator - (matchOffset("$1") - intVal(100))
  • operator* - represents operator * (matchOffset("$1") * intVal(100))
  • operator/ - represents operator / (matchOffset("$1") / intVal(100))
  • operator% - represents operator % (matchOffset("$1") % intVal(100))
  • operator^ - represents bitwise xor (matchOffset("$1") ^ intVal(100))
  • operator& - represents bitwise and (matchOffset("$1") & intVal(100))
  • operator| - represents bitwise or (matchOffset("$1") | intVal(100))
  • operator<< - represents bitwise shift left (matchOffset("$1") << intVal(10))
  • operator>> - represents bitwise shift right (matchOffset("$1") >> intVal(10))
  • operator() - represent call to function (id("func")(intVal(100), intVal(200)))
  • call(args) - represents call to function (id("func").call({intVal(100), intVal(200)}))
  • contains(rhs) - represents operator contains (id("signature").contains(stringVal("hello")))
  • matches(rhs) - represents operator matches (id("signature").matches(regexp("^a.*b$", "i")))
  • access(rhs) - represents operator . as access to structure (id("pe").access("numer_of_sections"))
  • operator[] - represents operator [] as access to array (id("pe").access("sections")[intVal(0)])
  • readInt8(be) - represents call to special function int8(be) (intVal(100).readInt8())
  • readInt16(be) - represents call to special function int16(be) (intVal(100).readInt16())
  • readInt32(be) - represents call to special function int32(be) (intVal(100).readInt32())
  • readUInt8(be) - represents call to special function uint8(be) (intVal(100).readUInt8())
  • readUInt16(be) - represents call to special function uint16(be) (intVal(100).readUInt16())
  • readUInt32(be) - represents call to special function uint32(be) (intVal(100).readUInt32())

At the end, you can just call get() method of YaraExpressionBuilder and you will get your Expression object. Make sure to store it if you want to use it later because YaraExpressionBuilder resets its state after calling get().

Hex strings

Before we get into construction of rules, we will show how hex strings can be constructed using YaraHexStringBuilder. Each hex string consists of hex string units, which are:

  • Nibble
  • Wildcard (?)
  • Jump ([low-high])
  • Alternative ((XX|YY|...))

When working with YaraHexStringBuilder, we are not always necessarily working on unit-level but sometimes on byte-level. Here are steps on how to create each type of unit:

  • YaraHexStringBuilder(byte) - creates two nibbles out of byte value.
  • wildcard() - creates ??
  • wildcardLow(nibble) - <nibble>?
  • wildcardHigh(nibble) - ?<nibble>
  • jumpVarying() - [-]
  • jumpFixed(offset) - [<offset>]
  • jumpVaryingRange(low) - [<low>-]
  • jumpRange(low, high) - [<low>-<high>]
  • alt(units) - (unit1|unit2|...)

Rule

If you are finished with your condition, you can now build rules with it. Similarly as with condition, you use YaraRuleBuilder to construct rules except this time, YaraRuleBuilder only provides few methods which are:

  • withName(name) - specify rule name
  • withModifier(mod) - specify whether rule is private or public (Rule::Modifier::Private or Rule::Modifier::Public)
  • withTag(tag) - specify rule tag
  • withStringMeta(key, value) - specify string meta
  • withIntMeta(key, value) - specify integer meta
  • withUIntMeta(key, value) - specify unsigned integer meta
  • withHexIntMeta(key, value) - specify hexadecimal integer meta
  • withBoolMeta(key, value) - specify boolean meta
  • withPlainString(id, value, mod) - specify plain string with identifier id and content value with modifiers mod (String::Modifiers::Ascii, String::Modifiers::Wide, String::Modifiers::Nocase, String::Modifiers::Fullword)
  • withHexString(id, str) - specify hex string (str is of type std::shared_ptr<HexString>)
  • withRegexp(id, value, mod) - specify regular expression with identifier id and content value with modifiers mod(Modifiers here are different than modifiers in plain string. These modifiers are tied to the regular expression and come after last/`.)
  • withCondition(cond) - specify condition

YARA File

Finally, after we have our rules constructed, we can form them into single YARA file using YaraFileBuilder, which provide these methods:

withModule(name) - specifies import of module named name withRule(rule) - adds the rule into file