Skip to content

Commit

Permalink
Add SAI generate debug dump HLD
Browse files Browse the repository at this point in the history
  • Loading branch information
aviramd committed Nov 4, 2024
1 parent c358fd8 commit e148706
Show file tree
Hide file tree
Showing 2 changed files with 199 additions and 55 deletions.
254 changes: 199 additions & 55 deletions doc/SAI_generate_debug_dump/SAI_generate_debug_dump.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# SYNCD Optimization for SONiC
# Generate SAI Debug Dump

## Table of Contents

- [SYNCD Optimization for SONiC](#syncd-optimization-for-sonic)
- [Generate SAI Debug Dump](#Generate-SAI-Debug-Dump)
- [Table of Contents](#table-of-contents)
- [Revision](#revision)
- [Scope](#scope)
Expand All @@ -11,88 +11,230 @@
- [Requirements](#requirements)
- [Architecture Design](#architecture-design)
- [Implementation](#implementation)
- [sonic-utilities](#sonic-utilities)
- [SWSS](#SWSS)
- [SWSS-common](#SWSS-common)
- [SAI-Redis](#SAI-Redis)
- [SAI API](#sai-api)
- [YANG model changes](#yang-model-changes)
- [CLI](#cli)
- [Warmboot and Fastboot Design Impact]
- [Testing Requirements/Design]
- [Unit Test cases]
- [System Test cases]
- [generate_sai_dump bash script](#generate_sai_dump-bash-script)
- [show techsupport](#show-techsupport)
- [gen_sai_dbg_dump.sh](#gen_sai_dbg_dump.sh)
- [DbgGenDump orchestration](#DbgGenDump-orchestration)
- [SAI global API sai_dbg_generate_dump](#SAI-global-API-sai-dbg_generate_dump)
- [syncd extended operation](#syncd-extended-operation)
- [YANG model changes](#yang-model-changes)
- [CLI](#cli)
- [Warmboot and Fastboot Design Impact](#Warmboot-and-Fastboot-Design-Impac)
- [Testing Requirements/Design](#Testing-Requirements/Design)
- [Unit Test cases](#Unit-Test-cases)
- [System Test cases](#System-Test-cases)

### Revision

| Rev | Date | Author | Change Description |
| :-: | :------: | :-----------------------: | ------------------ |
| 0.1 | 10/15/24 | Aviram Dali (**Marvell**) | Initial Draft |

### Scope

The scope of this document is to design the handling of taking a SAI dump during show techsupport call
The scope of this document is to design the handling of generating a SAI debug dump file by user command , specifically for `show techsupport` command.

### Terminology

| Term | Definition |
| --------- | --------------------------------------- |
| ASIC | Application Specific Integrated Circuit |
| SYNCD | ASIC Synchronization Service |
| SAI | Switch Abstraction Interface |
| API | Application Programmable Interface |
| SWSS | Switch State Service |


| Term | Definition |
| ----- | --------------------------------------- |
| ASIC | Application Specific Integrated Circuit |
| SYNCD | ASIC Synchronization Service |
| SAI | Switch Abstraction Interface |
| API | Application Programmable Interface |
| SWSS | Switch State Service |
### Overview
SAI dump file usually includes, SDK info and configuration , SAI stats, capture of SAI lower layer states like registers vales etc...
Currently, the SAI dump file is generated only during SAI failures by executing a dedicated executable named "saidump", which linkage with the SAI lib during initialization it creates a new switch in redundant mode. This new feature allows users to generate a SAI debug dump file using command such as show tech-support not necessarily during failure, and the dump file will be generated directly from the syncd process.
SAI dump file usually includes SDK info and configuration , SAI stats, capture of SAI lower layer states like registers vales etc...

Currently, the SAI dump file is generated only during SAI failures by executing a dedicated executable named "saisdkdump" (which linkage with the SAI lib during initialization and creates a new switch in redundant mode)

This new feature allows users to generate a SAI debug dump file using `show tech-support` command, not necessarily during failure.

### Requirements
Each vendor can add to its specific implantation part of the `show techsupport` a simple call to a new API to generate the SAI debug dump file.

+ Add infrastructure to generate a SAI debug dump file upon user request
+ generate a SAI debug dump file from 'show techsupport' command.
+ Generate a SAI debug dump file within the context of Syncd.
+ Maintain the existing mechanism for generating the SAI debug dump file on failure.

### Architecture Design

1. A user command, such as `show techsupport` triggers the `generate_sai_dump` bash script, which writes the file name to the STATE DB.
1. A user command, such as `show techsupport` triggers the `generate_sai_dump`, and creates a new table with the dump file name to create in the APPL DB.
2. A new orchestration agent, `DbgGenDumpOrch`, is triggered to handle the request.
3. `DbgGenDumpOrch` writes the file name to the ASIC DB and sets a new operation `REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP` for syncd.
4. Syncd calls the global SAI API `dbgGenerateDump` to generate the debug dump file, which is saved in syncd's file system.
5. Syncd sends a reply back to `DbgGenDumpOrch`.
6. `DbgGenDumpOrch` analyzes the response.
7. `DbgGenDumpOrch` updates the result in the STATE DB.
7. `DbgGenDumpOrch` updates the result in the APPL DB.
8. The user command retrieves the result.
9. The debug dump file is pulled on success.

The below diagram explains the sequence when a SAI failure happens
![](/images/generate_debug_dump_file.JPG)

The below diagram explains the generate debug dump file flow

### Implementation

#### sonic-utilities
Add a new script to the Debian file system named `gen_sai_dbg_dump_lib.sh`, which includes the `generate_sai_dump` API. This function takes the desired file name as an argument and initiates the generation of a SAI debug dump file by performing the following steps:

- Set the file name in the STATE DB to trigger the dump generation.
- Poll the STATE DB for the result with timeout of 10 seconds.
- Delete the relevant entries from the STATE DB after triggering the dump file.
- Ensure that the generated file exists.
![Architecture Design](images/generate_debug_dump_file.png)


**Show Techsupport**
- Introduced a new generic API, `generate_sai_dbg_dump_file`, in `generate_dump.sh` (invoked by the "show techsupport" command) to create a debug dump file. This change allows each vendor to call this API in their vendor-specific implementation
- After the file is generated, it is moved into the techsupport folder.

#### SWSS
- A new orchestration agent, `DbgGenDumpOrch`, has been introduced, which is triggered by updates in the STATE DB.
- It updates syncd by writing to the ASIC DB and waits for a response. Once received, it writes the result back to the STATE DB, allowing the calling application to retrieve the file.

#### SWSS-common
- add new tables name
### Implementation

#### SAI-Redis
- Implemented a new global API, `dbgGenerateDump`, in the `SaiInterface` class, ensuring that all derived classes provide the corresponding implementation, including the vendor SAI class to call the global API `sai_dbg_generate_dump`.
- Added a new syncd operation, `REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP`, which invokes the SAI API to generate the debug dump file.
#### generate_sai_dump bash script
Introduced a new script `/usr/local/bin/gen_sai_dbg_dump_lib.sh` that can be invoked from `show techsupport` or any other command

```
###############################################################################
# generate_sai_dump
#
# Description:
#  This function
# it ensures that the `syncd` container is running before initiating the dump.
# triggers the generation of a SAI debug dump file through Redis APPL DB.
# it waits for the file by Polling (with timeout) the APPL DB for the result.
# it removes the table from the DB when done.
#
# Arguments:
#  $1 - Filename for the SAI debug dump file.
#  $2 - Optional timeout for file readiness (default: 10 seconds).
#
# Returns:
#  0 - On success
#  1 - On failure
###############################################################################
generate_sai_dump() {
}
```

#### APPL DB
Introduced a new Tables in APPL DB :

```
key = DBG_GEN_DUMP_TABLE:DUMP ; Unique identifier for gen dump file.
;field = value
file_name = STRING ; full path file to save the dump file.
```

Example:
```
redis-cli -n 0 HGETALL "DBG_GEN_DUMP_TABLE:DUMP"
1) "file"
2) "/var/log/sai_dump_file.log"
```

wait for the dump generation result example:
```
key = DBG_GEN_DUMP_STAUS_TABLE:DUMP ; Unique identifier for gen dump file result
;field = value
status = SAI_STATUS ; result status of file dump generation
```

Example:
```
redis-cli -n 0 HGETALL "DBG_GEN_DUMP_STATUS_TABLE:DUMP"
1) "status"
2) "0"
```


#### show techsupport
Introduced a new generic API, `generate_sai_dbg_dump_file`, in `generate_dump.sh` (invoked by the `show techsupport` command) to create a debug dump file:

```
# generate_sai_dbg_dump_file
#
# Description:
# This function triggers the generation of a SAI debug dump file and saves the
# dumped file in the show techsupport output directory.
#
# Globals:
#  None
#
# Arguments:
#  $1 - (required) The file name (without path) the SAI debug dump will be saved
# under this name in the show techsupport output directory.
#
# Returns:
#  0 - On success
#  1 - On failure
###############################################################################
generate_sai_dbg_dump_file(){
...
}
```

usage:

```
generate_sai_dbg_dump_file "sai_sdk_dump_$(date +"%m_%d_%Y_%I_%M_%p")"
```

#### gen_sai_dbg_dump.sh
Introduced a new script `/usr/local/bin/gen_sai_dbg_dump.sh` that can be invoked from the CLI to generate the dump file directly under the given name (without calling `show techsupport` command)

```
/usr/local/bin/gen_sai_dbg_dump.sh -f /tmp/my_dump_file.log
```
#### DbgGenDump orchestration
- A new orchestration agent, `DbgGenDumpOrch`, has been introduced, which is triggered by updates in the APPL DB.

- It updates syncd by writing to the ASIC DB and waits for a response. Once received, it writes the result back to the APPL DB, allowing the calling application to retrieve the file.

#### ASIC DB
Introduced a new Tables in ASIC DB:

```
key = DBG_GEN_DUMP:DUMP ; Unique identifier for gen dump file result
;field = value
file_name = STRING ; full path file to save the dump file.
```

Example:

```
redis-cli -n 1 HGETALL "DBG_GEN_DUMP:DUMP"
1) "DBG_GENERATE_DUMP"
2) "/var/log/sai_dump_file.log"
```

#### SONIC support global API sai_dbg_generate_dump
`sai_dbg_generate_dump` is already supported in SAI. Similar to other global API that supported in Sonic, add support to the global API `sai_dbg_generate_dump` to the `SaiInterface` class and ensuring that all derived classes provide the corresponding implementation

```
    class SaiInterface{
    ...
            virtual sai_status_t dbgGenerateDump(
                    _In_ const char *dump_file_name) = 0;
    ...          
    }
```

#### syncd extended operation

Similar to other global API that supported in Sonic, add new operation to the syncd to support SAI debug generate dump

```
sai_status_t Syncd::processSingleEvent(
        _In_ const swss::KeyOpFieldsValuesTuple &kco)
{
...
    if (op == REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP)
        return processDbgGenerateDump(kco);
```


```
sai_status_t Syncd::processDbgGenerateDump(
        _In_ const swss::KeyOpFieldsValuesTuple &kco)
{
...
//call SAI dbgGenerateDump API
    sai_status_t status = m_vendorSai->dbgGenerateDump(file_path);
...
//update ASIC DB with the result
    m_selectableChannel->set(sai_serialize_status(status), {} , REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMPRESPONSE);
   
    return status;
}
```

#### SAI API
There are currently no new SAI APIs required for this feature.
Expand All @@ -109,7 +251,9 @@ There is no impact on warmboot or fastboot
### Testing Requirements/Design

#### Unit Test cases
execute dump file and make sure it exists
/usr/local/bin/gen_sai_dbg_dump.sh -f /tmp/my_dump_file.log

#### System Test cases
Verify if the dump in techsupport contains the SAI failure dump is collected.
Verify if the dump in `show techsupport` contains the SAI dump file.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e148706

Please sign in to comment.