diff --git a/.dockerignore b/.dockerignore index 9e00c82d..5e9c9cd6 100644 --- a/.dockerignore +++ b/.dockerignore @@ -1,7 +1,3 @@ data -elasticsearch -elasticsearch-searchguard -kibana -prometheus vendor .cache diff --git a/.env.example b/.env.example index 904b1eb9..4a38103b 100644 --- a/.env.example +++ b/.env.example @@ -1,12 +1,3 @@ -# Crawler -ELASTIC_URL=http://devita_elasticsearch:9200 +ELASTIC_URL=https://elasticsearch.developers.italia.it #ELASTIC_USER=elastic #ELASTIC_PWD=changeme - -# Elasticsearch -ES_JAVA_OPTS='-Xms256m -Xmx1g' - -# Kibana -ELASTICSEARCH_PROTOCOL=http -ELASTICSEARCH_HOST=devita_elasticsearch -ELASTICSEARCH_PORT=9200 diff --git a/Makefile b/Makefile deleted file mode 100644 index 47656995..00000000 --- a/Makefile +++ /dev/null @@ -1,12 +0,0 @@ -include .env - -.PHONY: up stop crawl - -up: - docker-compose --file=docker-compose-es-searchguard.yml up -d --remove-orphans - -stop: - docker-compose stop - -crawl: - docker-compose --file=docker-compose-es-searchguard.yml up -d diff --git a/README.md b/README.md index e4f53728..50490c32 100644 --- a/README.md +++ b/README.md @@ -1,126 +1,51 @@ -# Backend and crawler for the OSS catalog of Developers Italia +# Crawler for the OSS catalog of Developers Italia + [![CircleCI](https://circleci.com/gh/italia/developers-italia-backend/tree/master.svg?style=shield)](https://circleci.com/gh/italia/developers-italia-backend/tree/master) -[![Go Report Card](https://goreportcard.com/badge/github.com/italia/developers-italia-backend)](https://goreportcard.com/report/github.com/italia/developers-italia-backend) [![Join the #website channel](https://img.shields.io/badge/Slack%20channel-%23website-blue.svg?logo=slack)](https://developersitalia.slack.com/messages/C9R26QMT6) +[![Go Report Card](https://goreportcard.com/badge/github.com/italia/developers-italia-backend)](https://goreportcard.com/report/github.com/italia/developers-italia-backend) +[![Join the #website channel](https://img.shields.io/badge/Slack%20channel-%23website-blue.svg?logo=slack)](https://developersitalia.slack.com/messages/C9R26QMT6) [![Get invited](https://slack.developers.italia.it/badge.svg)](https://slack.developers.italia.it/) -## Overview: how the crawler works - -The crawler finds and retrieves the *publiccode.yml* files from the organizations registered on *Github/Bitbucket/Gitlab*, listed in the whitelist. -It then creates YAML files used by the [Jekyll build chain](https://github.com/italia/developers.italia.it) to generate the static pages of [developers.italia.it](https://developers.italia.it/). - -## Dependencies and other related software - -These are the dependencies and some useful tools used in conjunction with the crawler. - -* [Elasticsearch 6.8.7](https://www.elastic.co/products/elasticsearch) for storing the data. Elasticsearch should be active and ready to accept connections before the crawler gets started - -* [Kibana 6.8.7](https://www.elastic.co/products/kibana) for internal data visualization (optional) +## How it works -* [Prometheus 6.8.7](https://prometheus.io) for collecting metrics (optional, currently supported but not used in production) +The crawler finds and retrieves the **`publiccode.yml`** files from the +organizations in the whitelist. -## Tools +It then creates YAML files used by the +[Jekyll build chain](https://github.com/italia/developers.italia.it) +to generate the static pages of [developers.italia.it](https://developers.italia.it/). -This is the list of tools used in the repository: - -* [Docker](https://www.docker.com/) - -* [Docker-compose](https://docs.docker.com/compose/) - -* [Go](https://golang.org/) >= 1.11 +[Elasticsearch 6.8](https://www.elastic.co/products/elasticsearch) is used to store +the data which be active and ready to accept connections before the crawler is started. ## Setup and deployment processes -The crawler can either run directly on the target machine, or it can be deployed in form of Docker container, possibly using an orchestrator, such as Kubernetes. - -Up to now, the crawler and its dependencies have run in form of Docker containers on a virtual machine. Elasticsearch and Kibana have been deployed using a fork of the main project, called [search guard](https://search-guard.com/). This is still deployed in production and what we'll call in the readme *"legacy deployment process"*. - -With the idea of making the legacy installation more scalable and reliable, a refactoring of the code has been recently made. The readme refers to this approach as the *new deployment process*. This includes using the official version of Elasticsearch and Kibana, and deploying the Docker containers on top of Kubernetes, using helm-charts. While the crawler has it's [own helm-chart](https://github.com/teamdigitale/devita-infra-kubernetes), Elasticsearch and Kibana are deployed using their [official helm-charts](https://github.com/elastic/helm-charts). -The new deployment process uses a [docker-compose.yml](docker-compose.yml) file to only bring up a local development and test environment. - -The paragraph starts describing how to build and run the crawler, directly on a target machine. -The procedure described is the same automated in the Dockerfile. The -legacy and new- Docker deployment procedures are then described below. +The crawler can either run manually on the target machine or it can be deployed +in form of Docker container with +[its helm-chart](https://github.com/teamdigitale/devita-infra-kubernetes) in Kubernetes. ### Manually configure and build the crawler -* `cd crawler` - -* Fill the *domains.yml* file with configuration values (i.e. host basic auth tokens) - -* Rename the *config.toml.example* file to *config.toml* and fill the variables - -> **NOTE**: The application also supports environment variables in substitution to config.toml file. Remember: "environment variables get higher priority than the ones in configuration file" - -* Build the crawler binary: `make` +1. `cd crawler` -* Configure the crontab as desired +2. Save the auth tokens to `domains.yml`. -### Run the crawler -* Crawl mode (all item in whitelists): `bin/crawler crawl whitelist/*.yml` - - Crawl supports blacklists (see below for details), crawler will try to match each repository URL in its list with the ones listed in blacklists and if it so it will print a warn log and skip all operation on it. Furthermore it will immediately remove the blacklisted repository from ES if it is present. - -* One mode (single repository url): `bin/crawler one [repo url] whitelist/*.yml` - - In this mode one single repository at the time will be evaluated. If the organization is present, its IPA code will be matched with the ones in whitelist otherwise it will be set to null and the `slug` will have a random code in the end (instead of the IPA code). Furthermore, the IPA code validation, which is a simple check within whitelists (to ensure that code belongs to the selected PA), will be skipped. - - One supports blacklists (see below for details), whether `[repo url]` is present in one of indicated blacklist, crawler will exit immediately. Basically ignore all repository defined in list preventing the unauthorized loading in catalog. - -* `bin/crawler updateipa` downloads IPA data and writes them into Elasticsearch +3. Rename `config.toml.example` to `config.toml` and set the variables -* `bin/crawler delete [URL]` delete software from Elasticsearch using its code hosting URL specified in `publiccode.url` + > **NOTE**: The application also supports environment variables in substitution + > to config.toml file. Remember: "environment variables get higher priority than + > the ones in configuration file" -* `bin/crawler download-whitelist` downloads organizations and repositories from the [onboarding portal repository](https://github.com/italia/developers-italia-onboarding) and saves them to a whitelist file +4. Build the crawler binary with `make` -#### Crawler blacklists -Blacklists are needed to exclude individual repository that are not in line with our [guidelines](https://docs.italia.it/italia/developers-italia/policy-inserimento-catalogo-docs/it/stabile/approvazione-del-software-a-catalogo.html). +### Docker -##### Configuration -*config.toml* has a reference for blacklist configuration which can point to a given location and to all files that match given pattern. Blacklist is currently supported by commands: -- `one` -- `crawl` +The repository has a `Dockerfile`, used to build the production image, +and a `docker-compose.yml` file to facilitate the local deployment. -### Docker: the legacy deployment process +Before proceeding with the build, copy [`.env.example`](.env.example) +into `.env` and edit the environment variables as needed. -The paragraph describes how to setup and deploy the crawler, following the *legacy deployment process*. - -* Rename [.env-search-guard.example](.env-search-guard.example) to *.env* and adapt its variables as needed - -* Rename *elasticsearch-searchguard/config/searchguard/sg_internal_users.yml.example* to *elasticsearch/-searchguard/config/searchguard/sg_internal_users.yml* and insert the correct passwords. Hashed passwords can be generated with: - - ```shell - docker exec -t -i developers-italia-backend_elasticsearch elasticsearch-searchguard/plugins/search-guard-6/tools/hash.sh -p - ``` - -* Insert the *kibana* password in [kibana-searchguard/config/kibana.yml](kibana-searchguard/config/kibana.yml) - -* Configure the Nginx proxy for the elasticsearch host with the following directives: - - ``` - limit_req_zone $binary_remote_addr zone=elasticsearch_limit:10m rate=10r/s; - - server { - ... - location / { - limit_req zone=elasticsearch_limit burst=20 nodelay; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - proxy_pass http://localhost:9200; - proxy_ssl_session_reuse off; - proxy_cache_bypass $http_upgrade; - proxy_redirect off; - } - } - ``` - -* You might need to type `sysctl -w vm.max_map_count=262144` and make this permanent in /etc/sysctl.conf in order to start elasticsearch, as [documented here](https://hub.docker.com/r/khezen/elasticsearch/) - -* Start Docker: `make up` - -### Docker: the new deployment process - -The repository has a *Dockerfile*, used to also build the production image, and a *docker-compose.yml* file to facilitate the local deployment. - -The containers declared in the *docker-compose.yml* file leverage some environment variables that should be declared in a *.env* file. A [.env.example](.env.example) file has some exemplar values. Before proceeding with the build, copy the [.env.example](.env.example) into *.env* and modify the environment variables as needed. - -To build the crawler container, download its dependencies and start them all, run: +To build the crawler container run: ```shell docker-compose up [-d] [--build] @@ -132,44 +57,63 @@ where: * *--build* forces the containers build -To destroy the containers, use: +To destroy the container, use: ```shell docker-compose down ``` -#### Xpack - -By default, the system -specifically Elasticsearch- doesn't make use of xpack, so passwords and certificates. To do so, the Elasticsearch container mounts [this configuration file](elasticsearch/elasticsearch.yml). This will make things work out of the box, but it's not appropriate for production environments. - -An alternative configuration file that enables xpack is available [here](elasticsearch/elasticsearch-xpack.yml). In order to use it, you should +## Run the crawler -* Generate appropriate certificates for elasticsearch, save them in the *elasticsearch folder*, and make sure that their name matches the one contained in the [elasticsearch-xpack configuration file](elasticsearch/elasticsearch-xpack.yml). - -* Optionally change the [elasticsearch-xpack.yml configuration file](elasticsearch/elasticsearch-xpack.yml) as desired - -* Rename the [elasticsearch-xpack.yml configuration file](elasticsearch/elasticsearch-xpack.yml) to *elasticsearch.yml* +* Crawl mode (all item in whitelists): `bin/crawler crawl whitelist/*.yml` + * `crawl` supports blacklists (see below for details). The crawler will try to + match each repository URL in its list with the ones listed in blacklists and, + if it does, it will print a warn log and skip all operation on it. + Furthermore it will immediately remove the blacklisted repository from ES if + it is present. -* Change the environment variables in your *.env* file to make sure that crawler, elasticsearch, and kibana configurations have matching passwords +* One mode (single repository url): `bin/crawler one [repo url] whitelist/*.yml` + * In this mode one single repository at the time will be evaluated. If the + organization is present, its IPA code will be matched with the ones in + whitelist otherwise it will be set to null and the `slug` will have a random + code in the end (instead of the IPA code). Furthermore, the IPA code + validation, which is a simple check within whitelists (to ensure that code + belongs to the selected PA), will be skipped. + * `one` supports blacklists (see below for details), whether `[repo url]` is + present in one of the indicated blacklists, the crawler will exit immediately. + Basically ignore all repository defined in list preventing the unauthorized + loading in catalog. -At this point you can bring up the environment with *docker-compose*. +* `bin/crawler updateipa` downloads IPA data and writes them into Elasticsearch -## Troubleshooting Q/A +* `bin/crawler delete [URL]` deletes software from Elasticsearch using its code + hosting URL specified in `publiccode.url` -* From docker logs seems that Elasticsearch container needs more virtual memory and now it's *Stalling for Elasticsearch...* +* `bin/crawler download-whitelist` downloads organizations and repositories from + the [onboarding portal repository](https://github.com/italia/developers-italia-onboarding) + and saves them to a whitelist file - Increase container virtual memory: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode +### Crawler blacklists -* When trying to `make build` the crawler image, a fatal memory error occurs: "fatal error: out of memory" +Blacklists are needed to exclude individual repository that are not in line with +our +[guidelines](https://docs.italia.it/italia/developers-italia/policy-inserimento-catalogo-docs/it/stabile/approvazione-del-software-a-catalogo.html). - Probably you should increase the container memory: `docker-machine stop && VBoxManage modifyvm default --cpus 2 && VBoxManage modifyvm default --memory 2048 && docker-machine stop` +You can set `BLACKLIST_FOLDER` in `config.toml` to point to a directory +where blacklist files are located. +Blacklisting is currently supported by the `one` and `crawl` commands. ## See also -* [publiccode-parser-go](https://github.com/italia/publiccode-parser-go): the Go package for parsing publiccode.yml files +* [publiccode-parser-go](https://github.com/italia/publiccode-parser-go): the Go + package for parsing publiccode.yml files -* [developers-italia-onboarding](https://github.com/italia/developers-italia-onboarding): the onboarding portal +* [developers-italia-onboarding](https://github.com/italia/developers-italia-onboarding): + the onboarding portal ## Authors -[Developers Italia](https://developers.italia.it) is a project by [AgID](https://www.agid.gov.it/) and the [Italian Digital Team](https://teamdigitale.governo.it/), which developed the crawler and maintains this repository. +[Developers Italia](https://developers.italia.it) is a project by +[AgID](https://www.agid.gov.it/) and the +[Italian Digital Team](https://teamdigitale.governo.it/), which developed the +crawler and maintains this repository. diff --git a/docker-compose-es-searchguard.yml b/docker-compose-es-searchguard.yml deleted file mode 100644 index 84db3796..00000000 --- a/docker-compose-es-searchguard.yml +++ /dev/null @@ -1,63 +0,0 @@ -# Configuration variables are loaded by docker-compose from the .env file - -# Ubuntu 16.04.3 LTS has docker-compose 1.8.0, so it doesn't support 3.x -version: "2.0" - -services: - elasticsearch: - container_name: "${NAME}_elasticsearch" - image: khezen/elasticsearch:6.5.4 - environment: - - "CLUSTER_NAME=developers-italia" - - "ELASTIC_PWD=${ELASTIC_PWD}" - - "KIBANA_PWD=${KIBANA_PWD}" - - "ES_TMPDIR=/tmp" - - "HTTP_SSL=false" - - "HTTP_CORS_ENABLE=true" - - "HTTP_CORS_ALLOW_ORIGIN=/^https://(.+\\.)?${DOMAIN}/" - ports: - - "127.0.0.1:9200:9200" - restart: always - volumes: - - /data/elasticsearch:/elasticsearch/data - - ./elasticsearch-searchguard/config/elasticsearch.yml:/elasticsearch/config/elasticsearch.yml - - ./elasticsearch-searchguard/config/searchguard/sg_config.yml:/elasticsearch/config/searchguard/sg_config.yml - - ./elasticsearch-searchguard/config/searchguard/sg_action_groups.yml:/elasticsearch/config/searchguard/sg_action_groups.yml - - ./elasticsearch-searchguard/config/searchguard/sg_internal_users.yml:/elasticsearch/config/searchguard/sg_internal_users.yml - - ./elasticsearch-searchguard/config/searchguard/sg_roles.yml:/elasticsearch/config/searchguard/sg_roles.yml - - ./elasticsearch-searchguard/config/searchguard/sg_roles_mapping.yml:/elasticsearch/config/searchguard/sg_roles_mapping.yml - - kibana: - container_name: "${NAME}_kibana" - image: khezen/kibana:6.2.2 - environment: - - "ELASTICSEARCH_PROTOCOL=http" - - "ELASTICSEARCH_HOST=elasticsearch" - - "ELASTICSEARCH_PORT=9200" - - "KIBANA_PWD=${KIBANA_PWD}" - ports: - - "5601:5601" - restart: always - volumes: - - ./kibana-searchguard/config/kibana.yml:/opt/kibana-6.2.2-linux-x86_64/config/kibana.yml - - # prometheus: - # container_name: "${NAME}_prometheus" - # image: quay.io/prometheus/prometheus:v2.2.1 - # labels: - # - "traefik.enable=true" - # - "traefik.backend=prometheus" - # - "traefik.port=9090" - # - "traefik.frontend.rule=Host:prometheus.${DOMAIN}" - # restart: always - # volumes: - # - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - -# SPC Cloud machines have MTU 1400, and docker-compose does not automatically pick -# the host daemon MTU. -# https://github.com/moby/moby/issues/22297 -# https://github.com/docker/compose/issues/3438 -networks: - default: - driver_opts: - com.docker.network.driver.mtu: 1400 diff --git a/docker-compose.yml b/docker-compose.yml index 4557f8ee..61400172 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -2,7 +2,6 @@ version: '3.3' services: - # Crawler devita_crawler: container_name: devita_crawler image: italia/developers-italia-backend @@ -11,35 +10,6 @@ services: dockerfile: Dockerfile env_file: - .env - depends_on: - - devita_elasticsearch - networks: - - overlay - - # Elasticsearch - devita_elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:6.8.7 - container_name: devita_elasticsearch - env_file: - - .env - volumes: - - ./elasticsearch:/usr/share/elasticsearch/config - networks: - - overlay - ports: - - 9200:9200 - - 9300:9300 - - # Kibana - devita_kibana: - container_name: "devita_kibana" - image: docker.elastic.co/kibana/kibana:6.8.7 - env_file: - - .env - depends_on: - - devita_elasticsearch - ports: - - "5601:5601" networks: - overlay diff --git a/docs/deploy.md b/docs/deploy.md deleted file mode 100644 index dc4576ba..00000000 --- a/docs/deploy.md +++ /dev/null @@ -1,21 +0,0 @@ -## Deploy architecture - -### Deploy Architecture: containers - -Docker-compose - -- elasticsearch:khezen/elasticsearch:6.2.2 - Service elasticsearch: contains all the data saved by the crawler and used from the website developers.italia.it - Elasticsearch is a distributed, RESTful search and analytics engine. - -- kibana: khezen/kibana:6.2.2 - Kibana is the GUI for elasticsearch. Not exposed and not used by the crawler. - -- prometheus: quay.io/prometheus/prometheus:v2.2.1 - Prometheus offers a monitoring system and time series database. Used in order to save and review the metrics for the crawler service. - -- proxy: containous/traefik:experimental - Træfik is the reverse proxy that serves all the containers over the different ports. - -- crawler: italia/developers-italia-backend:0.0.1 - The crawler that will visit a repository, validate, download and save the publiccode.yml file. diff --git a/docs/docs.md b/docs/docs.md deleted file mode 100644 index c559bddf..00000000 --- a/docs/docs.md +++ /dev/null @@ -1,19 +0,0 @@ -## developers-italia-backend - -Backend & crawler for the OSS catalog of Developers Italia. - -Table of contents: - -- [README](../README.md) -- [Deploy Architecture (Docker & Containers)](deploy.md) -- [Files and folders description](fileAndFolders.md) -- [Elasticsearch details and data mapping](elasticsearch.md) -- [Configuration](fileAndFolders.md) -- [Crawler flow and steps](crawler.md) -- [Jekyll files generation](jekyll.md) - -- [References](references.md) - -### Run crawler in cron job - -Execute every 12 hours `0 */12 * * * make crawler > crawler.log` diff --git a/elasticsearch-searchguard/config/elasticsearch.yml b/elasticsearch-searchguard/config/elasticsearch.yml deleted file mode 100644 index f4367fc8..00000000 --- a/elasticsearch-searchguard/config/elasticsearch.yml +++ /dev/null @@ -1,44 +0,0 @@ -cluster.name: ${CLUSTER_NAME} - -node: - name: ${NODE_NAME} - master: ${NODE_MASTER} - data: ${NODE_DATA} - ingest: ${NODE_INGEST} - -discovery.zen: - minimum_master_nodes: ${MINIMUM_MASTER_NODES} - ping.unicast.hosts: ${HOSTS} -network.host: ${NETWORK_HOST} - -http: - enabled: ${HTTP_ENABLE} - compression: true - cors: - enabled: ${HTTP_CORS_ENABLE} - allow-origin: ${HTTP_CORS_ALLOW_ORIGIN} - -searchguard: - ssl.transport: - enabled: true - enable_openssl_if_available: true - keystore_type: JKS - keystore_filepath: searchguard/ssl/${NODE_NAME}-keystore.jks - keystore_password: ${KS_PWD} - truststore_type: JKS - truststore_filepath: searchguard/ssl/truststore.jks - truststore_password: ${TS_PWD} - enforce_hostname_verification: false - ssl.http: - enabled: ${HTTP_SSL} - clientauth_mode: OPTIONAL - enable_openssl_if_available: true - keystore_type: JKS - keystore_filepath: searchguard/ssl/${NODE_NAME}-keystore.jks - keystore_password: ${KS_PWD} - truststore_type: JKS - truststore_filepath: searchguard/ssl/truststore.jks - truststore_password: ${TS_PWD} - authcz.admin_dn: - - "CN=elastic ,OU=devops, C=COM" - enterprise_modules_enabled: false diff --git a/elasticsearch-searchguard/config/searchguard/sg_action_groups.yml b/elasticsearch-searchguard/config/searchguard/sg_action_groups.yml deleted file mode 100644 index 480f7400..00000000 --- a/elasticsearch-searchguard/config/searchguard/sg_action_groups.yml +++ /dev/null @@ -1,46 +0,0 @@ -# INDICES -READ: - - "indices:data/read*" -WRITE: - - "indices:data/write*" -CRUD: - - READ - - WRITE -CREATE_INDEX: - - "indices:admin/create*" -DELETE_INDEX: - - "indices:admin/delete*" -INDEX_OWNER: - - CREATE_INDEX - - CRUD -INDEX_ALL: - - "indices:*" - -# ELASTICSEARCH ENTRYPOINT -GET_TEMPLATE: - - "indices:admin/template/get" -PUT_TEMPLATE: - - "indices:admin/template/put" -TEMPLATE_OWNER: - - GET_TEMPLATE - - PUT_TEMPLATE -ES_INPUT: - - TEMPLATE_OWNER - - WRITE - - MONITOR - -# MONITORING -MONITOR: - - "cluster:monitor*" - -# SUPER POWERS -ALL: - - "*" - -# CUSTOM -SEARCH: - - "indices:data/read/search*" - - "indices:data/read/msearch*" - - SUGGEST -SUGGEST: - - "indices:data/read/suggest*" diff --git a/elasticsearch-searchguard/config/searchguard/sg_config.yml b/elasticsearch-searchguard/config/searchguard/sg_config.yml deleted file mode 100644 index c2562347..00000000 --- a/elasticsearch-searchguard/config/searchguard/sg_config.yml +++ /dev/null @@ -1,221 +0,0 @@ -# This is the main Search Guard configuration file where authentication -# and authorization is defined. -# -# You need to configure at least one authentication domain in the authc of this file. -# An authentication domain is responsible for extracting the user credentials from -# the request and for validating them against an authentication backend like Active Directory for example. -# -# If more than one authentication domain is configured the first one which succeeds wins. -# If all authentication domains fail then the request is unauthenticated. -# In this case an exception is thrown and/or the HTTP status is set to 401. -# -# After authentication authorization (authz) will be applied. There can be zero or more authorizers which collect -# the roles from a given backend for the authenticated user. -# -# Both, authc and auth can be enabled/disabled separately for REST and TRANSPORT layer. Default is true for both. -# http_enabled: true -# transport_enabled: true -# -# 5.x Migration: "enabled: true/false" will also be respected currently but only to provide backward compatibility. -# -# For HTTP it is possible to allow anonymous authentication. If that is the case then the HTTP authenticators try to -# find user credentials in the HTTP request. If credentials are found then the user gets regularly authenticated. -# If none can be found the user will be authenticated as an "anonymous" user. This user has always the username "sg_anonymous" -# and one role named "sg_anonymous_backendrole". -# If you enable anonymous authentication all HTTP authenticators will not challenge. -# -# -# Note: If you define more than one HTTP authenticators make sure to put non-challenging authenticators like "proxy" or "clientcert" -# first and the challenging one last. -# Because it's not possible to challenge a client with two different authentication methods (for example -# Kerberos and Basic) only one can have the challenge flag set to true. You can cope with this situation -# by using pre-authentication, e.g. sending a HTTP Basic authentication header in the request. -# -# Default value of the challenge flag is true. -# -# -# HTTP -# basic (challenging) -# proxy (not challenging, needs xff) -# kerberos (challenging) NOT FREE FOR COMMERCIAL -# clientcert (not challenging, needs https) -# jwt (not challenging) NOT FREE FOR COMMERCIAL -# host (not challenging) #DEPRECATED, will be removed in a future version. -# host based authentication is configurable in sg_roles_mapping - -# Authc -# internal -# noop -# ldap NOT FREE FOR COMMERCIAL USE - -# Authz -# ldap NOT FREE FOR COMMERCIAL USE -# noop - -searchguard: - dynamic: - # Set filtered_alias_mode to 'disallow' to forbid more than 2 filtered aliases per index - # Set filtered_alias_mode to 'warn' to allow more than 2 filtered aliases per index but warns about it (default) - # Set filtered_alias_mode to 'nowarn' to allow more than 2 filtered aliases per index silently - #filtered_alias_mode: warn - #kibana: - # Kibana multitenancy - NOT FREE FOR COMMERCIAL USE - # see https://github.com/floragunncom/search-guard-docs/blob/master/multitenancy.md - # To make this work you need to install https://github.com/floragunncom/search-guard-module-kibana-multitenancy/wiki - #multitenancy_enabled: true - #server_username: kibanaserver - #index: '.kibana' - #do_not_fail_on_forbidden: false - http: - anonymous_auth_enabled: true - xff: - enabled: false - internalProxies: '192\.168\.0\.10|192\.168\.0\.11' # regex pattern - #internalProxies: '.*' # trust all internal proxies, regex pattern - remoteIpHeader: 'x-forwarded-for' - proxiesHeader: 'x-forwarded-by' - #trustedProxies: '.*' # trust all external proxies, regex pattern - ###### see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html for regex help - ###### more information about XFF https://en.wikipedia.org/wiki/X-Forwarded-For - ###### and here https://tools.ietf.org/html/rfc7239 - ###### and https://tomcat.apache.org/tomcat-8.0-doc/config/valve.html#Remote_IP_Valve - authc: - kerberos_auth_domain: - http_enabled: false - transport_enabled: false - order: 6 - http_authenticator: - type: kerberos # NOT FREE FOR COMMERCIAL USE - challenge: true - config: - # If true a lot of kerberos/security related debugging output will be logged to standard out - krb_debug: false - # If true then the realm will be stripped from the user name - strip_realm_from_principal: true - authentication_backend: - type: noop - basic_internal_auth_domain: - http_enabled: true - transport_enabled: true - order: 4 - http_authenticator: - type: basic - challenge: true - authentication_backend: - type: intern - proxy_auth_domain: - http_enabled: false - transport_enabled: false - order: 3 - http_authenticator: - type: proxy - challenge: false - config: - user_header: "x-proxy-user" - roles_header: "x-proxy-roles" - authentication_backend: - type: noop - jwt_auth_domain: - http_enabled: false - transport_enabled: false - order: 0 - http_authenticator: - type: jwt - challenge: false - config: - signing_key: "base64 encoded HMAC key or public RSA/ECDSA pem key" - jwt_header: "Authorization" - jwt_url_parameter: null - roles_key: null - subject_key: null - authentication_backend: - type: noop - clientcert_auth_domain: - http_enabled: false - transport_enabled: false - order: 2 - http_authenticator: - type: clientcert - config: - username_attribute: cn #optional, if omitted DN becomes username - challenge: false - authentication_backend: - type: noop - ldap: - http_enabled: false - transport_enabled: false - order: 5 - http_authenticator: - type: basic - challenge: false - authentication_backend: - # LDAP authentication backend (authenticate users against a LDAP or Active Directory) - type: ldap # NOT FREE FOR COMMERCIAL USE - config: - # enable ldaps - enable_ssl: false - # enable start tls, enable_ssl should be false - enable_start_tls: false - # send client certificate - enable_ssl_client_auth: false - # verify ldap hostname - verify_hostnames: true - hosts: - - localhost:8389 - bind_dn: null - password: null - userbase: 'ou=people,dc=example,dc=com' - # Filter to search for users (currently in the whole subtree beneath userbase) - # {0} is substituted with the username - usersearch: '(sAMAccountName={0})' - # Use this attribute from the user as username (if not set then DN is used) - username_attribute: null - authz: - roles_from_myldap: - http_enabled: false - transport_enabled: false - authorization_backend: - # LDAP authorization backend (gather roles from a LDAP or Active Directory, you have to configure the above LDAP authentication backend settings too) - type: ldap # NOT FREE FOR COMMERCIAL USE - config: - # enable ldaps - enable_ssl: false - # enable start tls, enable_ssl should be false - enable_start_tls: false - # send client certificate - enable_ssl_client_auth: false - # verify ldap hostname - verify_hostnames: true - hosts: - - localhost:8389 - bind_dn: null - password: null - rolebase: 'ou=groups,dc=example,dc=com' - # Filter to search for roles (currently in the whole subtree beneath rolebase) - # {0} is substituted with the DN of the user - # {1} is substituted with the username - # {2} is substituted with an attribute value from user's directory entry, of the authenticated user. Use userroleattribute to specify the name of the attribute - rolesearch: '(member={0})' - # Specify the name of the attribute which value should be substituted with {2} above - userroleattribute: null - # Roles as an attribute of the user entry - userrolename: disabled - #userrolename: memberOf - # The attribute in a role entry containing the name of that role, Default is "name". - # Can also be "dn" to use the full DN as rolename. - rolename: cn - # Resolve nested roles transitive (roles which are members of other roles and so on ...) - resolve_nested_roles: true - userbase: 'ou=people,dc=example,dc=com' - # Filter to search for users (currently in the whole subtree beneath userbase) - # {0} is substituted with the username - usersearch: '(uid={0})' - # Skip users matching a user name, a wildcard or a regex pattern - #skip_users: - # - 'cn=Michael Jackson,ou*people,o=TEST' - # - '/\S*/' - roles_from_another_ldap: - enabled: false - authorization_backend: - type: ldap # NOT FREE FOR COMMERCIAL USE - #config goes here ... \ No newline at end of file diff --git a/elasticsearch-searchguard/config/searchguard/sg_internal_users.yml.example b/elasticsearch-searchguard/config/searchguard/sg_internal_users.yml.example deleted file mode 100644 index 57b31424..00000000 --- a/elasticsearch-searchguard/config/searchguard/sg_internal_users.yml.example +++ /dev/null @@ -1,14 +0,0 @@ -elastic: - hash: - roles: - - admin - -kibana: - hash: - roles: - - kibana_user - -frontend: - hash: - roles: - - search diff --git a/elasticsearch-searchguard/config/searchguard/sg_roles.yml b/elasticsearch-searchguard/config/searchguard/sg_roles.yml deleted file mode 100644 index 6937ab5d..00000000 --- a/elasticsearch-searchguard/config/searchguard/sg_roles.yml +++ /dev/null @@ -1,28 +0,0 @@ -admin: - cluster: - - ALL - indices: - "*": - "*": - - ALL - -kibana_user: - cluster: - - MONITOR - - CRUD - indices: - '?kibana': - '*': - - INDEX_ALL - - READ - -search: - indices: - # jekyll is the alias that includes all the indices we need to query - 'jekyll': - '*': - - SEARCH - 'indicepa*': - '*': - - SEARCH - diff --git a/elasticsearch-searchguard/config/searchguard/sg_roles_mapping.yml b/elasticsearch-searchguard/config/searchguard/sg_roles_mapping.yml deleted file mode 100644 index a7f7cbee..00000000 --- a/elasticsearch-searchguard/config/searchguard/sg_roles_mapping.yml +++ /dev/null @@ -1,18 +0,0 @@ -admin: - backendroles: - - admin - users: - - elastic - -kibana_user: - backendroles: - - kibana_user - users: - - kibana - -search: - backendroles: - - search - - sg_anonymous_backendrole - users: - - frontend diff --git a/elasticsearch-searchguard/scripts/config.sh.dist b/elasticsearch-searchguard/scripts/config.sh.dist deleted file mode 100644 index 0658b37c..00000000 --- a/elasticsearch-searchguard/scripts/config.sh.dist +++ /dev/null @@ -1,2 +0,0 @@ -export BASICAUTH="elastic:elastic" -export ELASTICSEARCH_URL=http://elasticsearch:9200 \ No newline at end of file diff --git a/elasticsearch-searchguard/scripts/createAlias.sh b/elasticsearch-searchguard/scripts/createAlias.sh deleted file mode 100755 index 0eab82b5..00000000 --- a/elasticsearch-searchguard/scripts/createAlias.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/bash -# -# To create an alias in elasticsearch -# - -source config.sh - -ALIAS=$1 -INDEX=$2 - -if [ ! -n "${ALIAS}" ] ; then - echo -e $RED "You have to pass alias name as first parameter of the script" $Z; - exit 1; -fi -if [ ! -n "${INDEX}" ] ; then - echo -e $RED "You have to pass index name as second parameter of the script" $Z; - exit 1; -fi - -generate_create_msg() { - cat <