ARTICLE AD BOX
I am utilizing PyTorch for federated experiments. As my experiments involves 50 datasets with models, so, I have to run multiple ML models experiments parallelly.
The code for training ML model is shared here:
def train(dataloader, model, loss_fn, optimizer, device): num_batches = len(dataloader) # Total number of observation divided by batch size model.train() model.to(device) total_loss = 0 for batch, (X, y) in enumerate(dataloader): X, y = X.to(device), y.to(device) # x is covariates and y is the pseudo values in the batch # Compute prediction error pred = model(X) loss = loss_fn(pred,y) # Backpropagation optimizer.zero_grad(set_to_none=True) loss.backward() optimizer.step() total_loss += float(loss.item()) total_loss /= num_batches return total_lossAs you can see that the PyTorch tensor x, y, and model is taken to cuda:0 device. However, still, the full CPU is taking this process.
I have already tried by settings this configurations:
torch.set_num_threads(2)Also, in the NVidia Flare side, I have restricted to use 5 threads. Still, CPU at server is being used 100%.
controller = FederatedAvg( patience=args.patience, modeldir=args.modeldir, reg=args.reg, lr=args.lr, optimizer=args.optimizer, epochs=args.epochs, partition=args.partition, batch_size=args.batch_size, model=args.model, device=args.device, dataset=args.dataset, num_clients=args.n_parties, num_rounds=args.comm_round, seed=args.init_seed, logdir=args.logdir, run_id=args.run_id, arguments=args, ) job.simulator_run(ws, n_clients=args.n_parties, threads=5, log_config= os.path.abspath(config_path))As a result, when I am running multiple ML methods concurrently, then due to usage of 100% CPU, only one process is processed at a time and other methods run are waiting to complete this experiment. However, I have to run multiple methods concurrently.
What is the best practice to obtain experiments faster? How can I improve?
