Skip to content

Latest commit

 

History

History
308 lines (268 loc) · 14.7 KB

README.md

File metadata and controls

308 lines (268 loc) · 14.7 KB

findmalware

A script that tries to guess presence of malicious code in PHP files. It does NOT look for known malware code fragments, but it uses a set of rules to infer what pieces PHP code are likely malware. As such it produces many false positives, so it uses a few whitelists to avoid showing files that are known to be good.

It analyzes every file whose name ends in '.php' looking for them recursively in the current directory, so you usually cd into the root directory of a PHP website and then you run this script there.

Please note that this script does NOT delete nor sanitize ANYTHING.
It only produces a report. It's up to the sysadmin (e.g. YOU) to read that report and take action
. As such, it is safe to run this script anywhere you like: it won't even notice if you run it in a read-only filesystem, as long as you have a working mktemp installed and provide appropriate arguments for the lists.

Current version uses four whitelists: a upstream one for whitelisted files, a local one for locally whitelisted files, a upstream one for whitelisted lines of code and a local one for locally whitelisted lines of code. Please note that you have little to no control over the name and full path of the code whitelists: they are computed from the name of the files whitelists, by prepending the "code-" prefix to those names. However -w and -W options let you control the full paths and names of the files whitelists. Code whitelists logic works this way: when a rule matches a line of code, that line is looked up in the code whitelists. If it's found there, it's assumed to be good and that line does not contribute to the malware-positive scoring of the file anymore. If all lines matched by any rule are found in the code whitelists, the whole file won't be considered malware-positive anymore, so it won't be shown to the user, but it won't be automatically whitelisted as a whole either (unless you use -A): that ensures that future rules updates will be checked against infected files that may have slipped through in the past. In all other cases, the file will still be considered suspect and shown to the user, including all lines of code that are matched by any rule, even if they were already present in the code whitelists. When the user chooses to whitelist a file, all lines that were matched by any rule are added to the local code whitelist, if they aren't there yet. Since we are language agnostic, we can't assume a line of code is terminated by a semicolon. Moreover, we never want to whitelist too short lines, because they are easily reusable in malicious code too. A "line of code" here is defined as the line that matches a rule, padded with some leading and trailing context lines. That pattern is exactly what is being hashed. Please note that changing the number of context lines (it defaults to 2) produces different hashes for the same line of code, so any code whitelists compiled with different context size won't work anymore. For this reason there aren't any options to change the contextx size, but you can always override the deafult value in a configuration file, assuming you know what you are doing.

Installation:

This script depends on the following tools

  1. mktemp
  2. GNU find
  3. GNU grep
  4. GNU sed
  5. GNU screen
  6. sha512sum
  7. curl for the autoupdate feature
  8. a bunch of other rather common system commands
  9. all of the above plus tar, gzip, unzip for the -t option

You need to ensure those commands are installed for this script to work. Please note that this script does not check if those commands are available, so failing to install them beforehand can produce undefined results.

Usage:

findmalware.sh [ -c config-file ] [ -b local-blacklist-file ] [ -B upstream-blacklist-file ] [ -w local-whitelist-file ] [ -W upstream-whitelist-file ] [ -r local-rules-file ] [ -R upstream-rules-file ] [ -u upstream-url ] [ -e extensions_list ] [ -t trustedurl ] [ -a ] [ -A ] [ -h ] [ -m ] [ -U ] [ -d directory ]

where

-c config-file Specifies a custom configuration file to load after the default ones. Every configuration file is a bash fragment that this script includes untouched. This option is parsed before other options, so that command line arguments can override any settings in configuration files. Default configuration files that are ALWAYS loaded if readable, whether you use this option or not, are: * /etc/findmalware.conf * /usr/local/etc/findmalware.conf * $HOME/.findmalware.conf * $HOME/.config/findmalware.conf * $HOME/.findmalware/config in that order. If you specify a custom configuration file and that file is not readable, the script aborts execution by design. The FINDMALWARE_CONFIG environment variable can be used instead of this option, but if both are specified, this option takes precedence.

-b local-blacklist-file specifies a blacklist file to use as a cache. This file needs to be writeable in order to be of any use. However you can live without it, it's really only a way to avoid showing two suspect files when they have exactly the same content and the user has alreay tagged the first one as suspect. A local blacklist can save you quite some time if you scan for malware a directory that contains a website and one or more of its backups. It defaults to $HOME/.findmalware/blacklist.local

-B upstream-blacklist-file same as -b, but it specifies a local copy of a upstream blacklist, possibly downloaded from the internet. This can be a read-only file, but if you use the autoupdate feature (-a option), then this file and its parent directory need to be writeable for you to blacklist anything. It defaults to $HOME/.findmalware/blacklist

-w local-whitelist-file specifies a whitelist file to use as a cache. For each false positive file, the user can choose to whitelist it. The script then adds the file name, size and checksum to this whitelist file. If a file is whitelisted it won't be shown to the user again in the future, but the script will only add a notice to the scan report. Please note that whitelisting a file by mistake can defeat the whole pourpose of this script, so pay attention before whitelisting a file and, if you are not sure, you better avoid whitelisting anything. This file needs to be writeable for you to whitelist anything. It defaults to $HOME/.findmalware/whitelist.local

-W upstream-whitelist-file same as -w, but it specifies a local copy of a upstream whitelist, possibly downloaded from the internet. This can be a read-only file, but if you use the autoupdate feature (-a option), then this file and its parent directory need to be writeable. Please note that downloading random whitelists from unknown sources for sake of saving some time during scans paves the way to a slow and painful death. And no, you should NOT trust the official upstream whitelist published by the author of this script either, unless you have a good reason. And please note that the author of this documentation is the same as the author of the script, and he is the one who publishes the default upstream whitelist too. In other words: do NOT trust ME, where "ME"="Lucio Crusca", your faithful author of the script who has written this stuff. I'm not saying I'm so bad, but I can make mistakes and you can't hold me liable for that. You have been warned. That being said, this script works for me and I trust myself, so this script has the autoupdate feature enabled by default and it downloads my whitelist by default. This option defaults to $HOME/.findmalware/whitelist

-r local-rules-file specifies a file containing rules, one per line. Rules are regular expressions as recognized by grep. PHP files matching one or more rules are considered infected and shown to the user during scans. If you want to use more rules than the default ones, you can write them in a file and pass the file to the script with this option. It defaults to $HOME/.findmalware/rules.local

-R upstream-rules-file same as -r, but it specifies a local copy of a upstream rules file, possibly downloaded from the internet. If you use the autoupdate feature (-a option), then this file and its parent directory need to be writeable. All the -W advices apply, so go read them if you haven't yet. It defaults to $HOME/.findmalware/rules

-u upstream-url specifies the URL to use when autoupdating local copies of upstream files. When autoupdating, this script looks for the following files online: * upstream-url/whitelist-latest.txt * upstream-url/blacklist-latest.txt * upstream-url/rules-latest.txt It uses curl to check last modification date online and to download them if needed. It defaults to https://webcloud.virtualbit.it/findmalware

-e extensions-list specifies the comma-separated list of file extensions to consider. If you use this option, remember to include the default php too, unless you really want to avoid files ending in ".php". Example:

      -e php,php4,php5,php7

   This script was initially intended to look for malicious code in PHP files
   only, but there is nothing in the code that prevents its use for other type
   of files, so you can use this option to specify any extension you like. 
   Please note however that, as of time of this writing, the whitelists, 
   blacklists and rules provided by default make sense only for PHP files, 
   but you can always use different lists than the default ones.
   It defaults to php only.

-a [on|off] enables or disables the autoupdate of the lists and rules. It's enabled by default, for reasons you can find in the -W documentation above. Haven't read that yet?

-A automatically whitelists every file it would otherwise ask the user about. It implies -U. Use only in directories that are trusted fresh clones of upstream code repositories. DO NOT USE if the directory has already been served over the network even only once. Best use this on a disconnected host and then copy over the resulting whitelist.

-U Unattended execution. In this mode the script does not interactively ask anything and it just produces the report. The only output in this case will be the full path and filename of the report itself.

-t trusted-url downloads the file (tar or zip archive) at the specified URL in a temporary directory, then it expands it and it runs -A (autowhitelist, see above)
on its contents. Only .tar.gz and .zip formats are supported and the file MUST be served over a secure HTTPS connection, so the URL MUST start with https://. This obviously falls short of a real security policy and it's up to you to pass to this option only URLs you really trust. This option does NOT and will NEVER support connections to servers with self-signed or invalid SSL certificates. This option overrides -d and sets the directory to scan to the temporary directory where it expanded the trusted file.

-d directory The directory to scan for malware. It defaults to the current working directory.

-h shows this help and exits.

-m shows this help in markdown format and exits.

Configuration files:

All options except "-h" and "-m" have a matching variable inside the script, so you can set them in configuration files too, but see below for "-c". Here is the list of variables and their respective controlling option. Options, if specified, take precedence over configuration files. Environment variables by the same name of these ones are silently ignored.

*  AUTOUPDATE_LISTS is controlled by -a
*  AUTOWHITELIST is controlled by -A
*  BLACKLIST_ADDITIONS is controlled by -b
*  UPSTREAM_BLACKLIST is controlled by -B
*  RULES_ADDITIONS is controlled by -r
*  UPSTREAM_RULES is controlled by -R
*  URL_FOR_UPSTREAM_LISTS is controlled by -u
*  WHITELIST_ADDITIONS is controlled by -w
*  UPSTREAM_WHITELIST is controlled by -W
*  EXTENSIONS_LIST is controlled by -e
*  UNATTENDED_EXECUTION is controlled by -U
*  TRUSTED_URL is controlled by -t
*  NUMBER_OF_CONTEXT_LINES has no matching option (see above)
*  SCANDIR is controlled by -d
*  USERCONFFILE is controlled by -c, but...

Additionally you can define the FINDMALWARE_CONFIG variable in your environment end export it to this script before running it. Doing so will cause the specified value to be used as initialization for USERCONFFILE. Default configuration files can do whatever they see fit in variables values, including USERCONFFILE. Thus, it is theoretically possible to assign a value to USERCONFFILE inside one of the default configuration files, causing the specified conf-file to be loaded afterwards, but I suspect it makes little sense to do so.

The rules file

At its core, all findmalware does is running grep against the files containig PHP code. The rules file is fed directly to grep, which uses it to match rows in the files and output potentially maicious files names and snippets of code. The rules file is just a list of grep regular expressions. You can find an example rules file in this project repository: it can be used as a starting point that produces many false positives (and also it does not find many true positives). You are encouraged to write your own rules, and eventually share it if you find it works better than the one in the findmalware repository. This script was written with PHP in mind, but you can write rules for other languages too. See the options above to change the rules file and the files extensions findmalware should use.

Examples:

Search for malware in the current directory, using rules and lists provided by the script author (NOT RECOMMENDED)

  findmalware.sh

Search for malware in non interactive mode, using your own rules and lists (RECOMMENDED).

  findmalware.sh -U -u https://example.com/my-own-findmalware-rules-and-lists-folder

Scan the latest WordPress release, taking for granted it has no malware in it, find false positives and add them to your local whitelist (suitable for a cron job)

  findmalware.sh -t https://wordpress.org/latest.tar.gz