Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class #15

Open
luocmin opened this issue Oct 19, 2020 · 11 comments
Open

class #15

luocmin opened this issue Oct 19, 2020 · 11 comments

Comments

@luocmin
Copy link

luocmin commented Oct 19, 2020

There's a little bit of an error with the class.
image
I changed one line of code to change args.tc.classes to args.tc.num_uclasses as shown below.
image

@luocmin
Copy link
Author

luocmin commented Oct 19, 2020

Could you please tell me how to fix this error? And where did I find the args.dataset_name.
image
Is there a use_naive_taxonomy parameter missing here?
image

@luocmin
Copy link
Author

luocmin commented Oct 19, 2020

Author, there is this problem at present, I don't know how to solve it, the situation is urgent, thank you!!

@luocmin
Copy link
Author

luocmin commented Oct 19, 2020

This is how I changed the number of dataset into the number of GPU 1. I don't know if this means the training has begun
image

@luocmin
Copy link
Author

luocmin commented Oct 22, 2020

Author, could you spare me a moment? Could you help me with the question I had two days ago? Thank you

@johnwlambert
Copy link
Collaborator

Hi @luocmin , please pull the latest version. Let me know if this doesn't answer your questions:

(1) You're right that tc.classes has been changed to tc.num_uclasses, thanks for catching that. I've corrected the train.py script in my latest commit.

(2) dataset_name is set in Line 497: https://github.com/mseg-dataset/mseg-semantic/blob/training/mseg_semantic/tool/train.py#L497

(3) Please pull in the latest master of mseg-semantic into your branch to see the parameter use_naive_taxonomy: https://github.com/mseg-dataset/mseg-semantic/blob/master/mseg_semantic/utils/transform.py#L54

(4) I didn't catch how you are doing the dataset to GPU mapping, could you explain in more detail here?

If you are limited by GPU RAM, you could also

@luocmin
Copy link
Author

luocmin commented Oct 22, 2020

@johnwlambert Hi,
1、What I want to ask about this picture is what is the interface for successful training?
image

2、In addition, I modified some of the code in the MSEg-3m.YAMl file,
May I ask whether this modification is correct ?
as shown below:
change:
image
to:dataset: [ade20k-150-relabeled]
GPU_map:
image
to:dataset_gpu_mapping: {'ade20k-150-relabeled': [0]}

@luocmin
Copy link
Author

luocmin commented Oct 22, 2020

3、If I modified this class, do I need to recompile mseg_semantic
image

@luocmin
Copy link
Author

luocmin commented Oct 22, 2020

The following problems occurred in training two data sets with two CARDS. What is the reason? How to solve it?
image

@johnwlambert
Copy link
Collaborator

Hi @luocmin, there is no compilation involved in our repo since all files are pure Python or bash.

Our configuration is to use 7 processes. Each process processes one dataset

train_data = dataset.SemData(split='train', data_root=args.data_root[args.dataset_name], data_list=args.train_list[args.dataset_name], transform=train_transform)

The gpu index (0,1,2...,6) is the rank and we call

def get_rank_to_dataset_map(args) -> Dict[int,str]:
    """
        Obtain a mapping from GPU rank (index) to the name of the dataset residing on this GPU.
        Args:
        -   args
        Returns:
        -   rank_to_dataset_map
    """
    rank_to_dataset_map = {}
    for dataset, gpu_idxs in args.dataset_gpu_mapping.items():
        for gpu_idx in gpu_idxs:
            rank_to_dataset_map[gpu_idx] = dataset
    print('Rank to dataset map: ', rank_to_dataset_map)
    return rank_to_dataset_map

args.dataset_name = rank_to_dataset_map[args.rank]
...
train_data = dataset.SemData(split='train', data_root=args.data_root[args.dataset_name], data_list=args.train_list[args.dataset_name], transform=train_transform)
...
train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size, shuffle=(train_sampler is None), num_workers=args.workers, pin_memory=True, sampler=train_sampler, drop_last=True)

See here

Changing our config by mapping each dataset to the same GPU will mean that only one dataset is trained (the last one to hit line 394

rank_to_dataset_map[gpu_idx] = dataset
).

You will need a different strategy about how to use fewer GPUs. I mentioned a few already (concatenating all image IDs into a single dataset, which could be sharded across 4 gpus, or instead you could accumulate gradients in place over 2 forward and backward passes, and then perform a single gradient update).

@luocmin
Copy link
Author

luocmin commented Oct 23, 2020

Thank you, but being white I won't change the dataloader code for now

@luocmin
Copy link
Author

luocmin commented Oct 23, 2020

I have been trying to run two data sets with two CARDS, but it still doesn't work. May I ask if I need to use the script you mentioned before to run the command? There is no script in the report.
image
This is the command I ran on the server with an error:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants