Fault Injection is a technique to test the robustness of your application by deliberately introducing faults. In particular, it is often used to test the error handling code paths of your services. Especially in the microservice space it is important to handle failures in service-to-service calls gracefully, because there are "a lot of moving parts" where failure can occur unexpectedly while communication happens.
- get to know the types of possible failure injections with Istio
- add failures to the backend service
First, let's reset the application to an intial state and test the environment.
Apply destination rule and deployment:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: calcbackend-rule
namespace: challengeistio
spec:
host: calcbackendsvc
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Configure the VirtualService:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: backend-vs
namespace: challengeistio
spec:
hosts:
- calcbackendsvc
http:
- match:
- headers:
user-agent:
regex: .*Mobile.*
route:
- destination:
host: calcbackendsvc
subset: v2
- route:
- destination:
host: calcbackendsvc
subset: v1
weight: 50
- destination:
host: calcbackendsvc
subset: v2
weight: 50
Deploy a new frontend that is capable of showing errors as they occur:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jscalcfrontend-v2
namespace: challengeistio
spec:
replicas: 1
minReadySeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
template:
metadata:
labels:
name: jscalcfrontend
app: frontend
version: v2
spec:
containers:
- name: jscalcfrontend
image: csaocpger/jscalcfrontend:9.0
ports:
- containerPort: 80
name: http
protocol: TCP
env:
- name: "ENDPOINT"
value: "calcbackendsvc"
- name: "PORT"
value: "80"
Now open up a browser window and check that everything works as expected.
Now, it's time to add some faults during communication inside our service mesh, to test the resiliancy of our application.
With the following VirtualService
definition, we add 30% failure to our v1
and v2
services (Http StatusCode 500).
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: backend-vs
namespace: challengeistio
spec:
hosts:
- calcbackendsvc
http:
- match:
- headers:
user-agent:
regex: .*Mobile.*
route:
- destination:
host: calcbackendsvc
subset: v2
- route:
- destination:
host: calcbackendsvc
subset: v1
weight: 50
- destination:
host: calcbackendsvc
subset: v2
weight: 50
fault:
abort:
percent: 30
httpStatus: 500
Open the browser an see errors appear.
Now, let's add another common scenario: high service latency. Any cloud native application also has to deal with services, that may experience high response times now and then. Your application should be able to deal with such situations.
Simulate "high latency" with the following definition:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: backend-vs
namespace: challengeistio
spec:
hosts:
- calcbackendsvc
http:
- match:
- headers:
user-agent:
regex: .*Mobile.*
route:
- destination:
host: calcbackendsvc
subset: v2
- route:
- destination:
host: calcbackendsvc
subset: v1
weight: 50
- destination:
host: calcbackendsvc
subset: v2
weight: 50
fault:
delay:
percent: 30
fixedDelay: 3s
Open the browser an see some service calls being delayed for approximately 3 seconds.
Of course, you have to handle these kinds of failures/scenarios in your source code. Developers should always implement proper error handling and use timeouts/retry mechanisms when calling internal/external services.
There are a lot of libraries in different languages, that can help you with that. E.g. if you are using .NET there is a NuGet package called Polly (https://github.com/App-vNext/Polly), for NodeJS you can use http clients like Axios (https://github.com/axios/axios)/ Axios-Retry (https://github.com/softonic/axios-retry) or Request (https://github.com/request/request) that have mechanisms for dealing with timeouts and errors. Go has also a very popular library called Go-Resiliency (https://github.com/eapache/go-resiliency) etc.
Reset the VirtualService definition to baseline:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: backend-vs
namespace: challengeistio
spec:
hosts:
- calcbackendsvc
http:
- match:
- headers:
user-agent:
regex: .*Mobile.*
route:
- destination:
host: calcbackendsvc
subset: v2
- route:
- destination:
host: calcbackendsvc
subset: v1
weight: 50
- destination:
host: calcbackendsvc
subset: v2
weight: 50