Article
graph-neural-networksgnnpytorchpytorch-geometricnode-classificationdeep-learningmachine-learning
Build a Node Classifier with Graph Neural Networks
Implement a simple Graph Neural Network (GNN) for node classification using PyTorch Geometric. You'll train a model to predict the subject of a scientific paper based on its citations within a network.
intermediate30 min4 steps
The play
- Install Dependencies & Load DataInstall PyTorch and PyTorch Geometric. We'll then load the built-in 'Cora' dataset, a citation network where nodes are papers and edges are citations. Each node has a feature vector and a class label (the paper's subject).
- Define the GNN ModelCreate a Graph Neural Network model using PyTorch. This model will have two Graph Convolutional Network layers (`GCNConv`). The first layer performs message passing and updates node embeddings, followed by a ReLU activation. The second layer produces the final embeddings for classification.
- Set Up TrainingInstantiate the model, define an Adam optimizer, and specify the loss function (Negative Log-Likelihood Loss, suitable for our log_softmax output). The training function will perform a forward pass, calculate the loss on the training nodes, and update the model's weights.
- Train and Evaluate the ModelRun the training loop for a set number of epochs. After training, switch the model to evaluation mode to get predictions on the test set. We calculate accuracy by comparing the predicted class labels with the true labels for the test nodes.
Starter code
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv
# 1. Load the Cora dataset
try:
dataset = Planetoid(root='/tmp/Cora', name='Cora')
except Exception as e:
print(f"Failed to download dataset. You may need to install additional libraries like 'requests'. Error: {e}")
exit()
data = dataset[0]
# 2. Define the Graph Neural Network model
class GCN(torch.nn.Module):
def __init__(self, num_features, num_classes):
super(GCN, self).__init__()
self.conv1 = GCNConv(num_features, 16)
self.conv2 = GCNConv(16, num_classes)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
# 3. Setup for training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN(dataset.num_node_features, dataset.num_classes).to(device)
data = data.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
# Training function
def train():
model.train()
optimizer.zero_grad()
out = model(data)
loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
return loss.item()
# Evaluation function
@torch.no_grad()
def test():
model.eval()
out = model(data)
pred = out.argmax(dim=1)
correct = pred[data.test_mask] == data.y[data.test_mask]
acc = int(correct.sum()) / int(data.test_mask.sum())
return acc
# 4. Train and evaluate
print("Starting training...")
for epoch in range(1, 201):
loss = train()
if epoch % 20 == 0:
test_acc = test()
print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Test Accuracy: {test_acc:.4f}')
final_acc = test()
print(f'\nFinal Test Accuracy: {final_acc:.4f}')