class #15

luocmin · 2020-10-19T02:39:25Z

There's a little bit of an error with the class.

I changed one line of code to change args.tc.classes to args.tc.num_uclasses as shown below.

luocmin · 2020-10-19T03:01:54Z

Could you please tell me how to fix this error? And where did I find the args.dataset_name.

Is there a use_naive_taxonomy parameter missing here?

luocmin · 2020-10-19T06:00:09Z

Author, there is this problem at present, I don't know how to solve it, the situation is urgent, thank you！！

luocmin · 2020-10-19T06:19:08Z

This is how I changed the number of dataset into the number of GPU 1. I don't know if this means the training has begun

luocmin · 2020-10-22T01:39:11Z

Author, could you spare me a moment? Could you help me with the question I had two days ago? Thank you

johnwlambert · 2020-10-22T03:11:47Z

Hi @luocmin , please pull the latest version. Let me know if this doesn't answer your questions:

(1) You're right that tc.classes has been changed to tc.num_uclasses, thanks for catching that. I've corrected the train.py script in my latest commit.

(2) dataset_name is set in Line 497: https://github.com/mseg-dataset/mseg-semantic/blob/training/mseg_semantic/tool/train.py#L497

(3) Please pull in the latest master of mseg-semantic into your branch to see the parameter use_naive_taxonomy: https://github.com/mseg-dataset/mseg-semantic/blob/master/mseg_semantic/utils/transform.py#L54

(4) I didn't catch how you are doing the dataset to GPU mapping, could you explain in more detail here?

If you are limited by GPU RAM, you could also

luocmin · 2020-10-22T03:30:06Z

@johnwlambert Hi,
1、What I want to ask about this picture is what is the interface for successful training?

2、In addition, I modified some of the code in the MSEg-3m.YAMl file,
May I ask whether this modification is correct ？
as shown below：
change:

to:dataset: [ade20k-150-relabeled]
GPU_map:

to:dataset_gpu_mapping: {'ade20k-150-relabeled': [0]}

luocmin · 2020-10-22T03:50:58Z

3、If I modified this class, do I need to recompile mseg_semantic

luocmin · 2020-10-22T06:03:40Z

The following problems occurred in training two data sets with two CARDS. What is the reason? How to solve it?

johnwlambert · 2020-10-22T21:27:03Z

Hi @luocmin, there is no compilation involved in our repo since all files are pure Python or bash.

Our configuration is to use 7 processes. Each process processes one dataset

mseg-semantic/mseg_semantic/tool/train.py

Line 537 in f8afb3c

    
           train_data = dataset.SemData(split='train', data_root=args.data_root[args.dataset_name], data_list=args.train_list[args.dataset_name], transform=train_transform)

The gpu index (0,1,2...,6) is the rank and we call

def get_rank_to_dataset_map(args) -> Dict[int,str]:
    """
        Obtain a mapping from GPU rank (index) to the name of the dataset residing on this GPU.
        Args:
        -   args
        Returns:
        -   rank_to_dataset_map
    """
    rank_to_dataset_map = {}
    for dataset, gpu_idxs in args.dataset_gpu_mapping.items():
        for gpu_idx in gpu_idxs:
            rank_to_dataset_map[gpu_idx] = dataset
    print('Rank to dataset map: ', rank_to_dataset_map)
    return rank_to_dataset_map

args.dataset_name = rank_to_dataset_map[args.rank]
...
train_data = dataset.SemData(split='train', data_root=args.data_root[args.dataset_name], data_list=args.train_list[args.dataset_name], transform=train_transform)
...
train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size, shuffle=(train_sampler is None), num_workers=args.workers, pin_memory=True, sampler=train_sampler, drop_last=True)

See here

Changing our config by mapping each dataset to the same GPU will mean that only one dataset is trained (the last one to hit line 394

mseg-semantic/mseg_semantic/tool/train.py

Line 394 in f8afb3c

rank_to_dataset_map[gpu_idx] = dataset

).

You will need a different strategy about how to use fewer GPUs. I mentioned a few already (concatenating all image IDs into a single dataset, which could be sharded across 4 gpus, or instead you could accumulate gradients in place over 2 forward and backward passes, and then perform a single gradient update).

luocmin · 2020-10-23T01:56:11Z

Thank you, but being white I won't change the dataloader code for now

luocmin · 2020-10-23T02:19:25Z

I have been trying to run two data sets with two CARDS, but it still doesn't work. May I ask if I need to use the script you mentioned before to run the command? There is no script in the report.

This is the command I ran on the server with an error:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

class #15

class #15

luocmin commented Oct 19, 2020 •

edited

Loading

luocmin commented Oct 19, 2020 •

edited

Loading

luocmin commented Oct 19, 2020

luocmin commented Oct 19, 2020

luocmin commented Oct 22, 2020

johnwlambert commented Oct 22, 2020

luocmin commented Oct 22, 2020

luocmin commented Oct 22, 2020

luocmin commented Oct 22, 2020

johnwlambert commented Oct 22, 2020

luocmin commented Oct 23, 2020

luocmin commented Oct 23, 2020

class #15

class #15

Comments

luocmin commented Oct 19, 2020 • edited Loading

luocmin commented Oct 19, 2020 • edited Loading

luocmin commented Oct 19, 2020

luocmin commented Oct 19, 2020

luocmin commented Oct 22, 2020

johnwlambert commented Oct 22, 2020

luocmin commented Oct 22, 2020

luocmin commented Oct 22, 2020

luocmin commented Oct 22, 2020

johnwlambert commented Oct 22, 2020

luocmin commented Oct 23, 2020

luocmin commented Oct 23, 2020

luocmin commented Oct 19, 2020 •

edited

Loading

luocmin commented Oct 19, 2020 •

edited

Loading