Article

graph-neural-networksgnnpytorchpytorch-geometricnode-classificationdeep-learningmachine-learning

Build a Node Classifier with Graph Neural Networks

Implement a simple Graph Neural Network (GNN) for node classification using PyTorch Geometric. You'll train a model to predict the subject of a scientific paper based on its citations within a network.

intermediate30 min4 steps

The play

Install Dependencies & Load Data
Install PyTorch and PyTorch Geometric. We'll then load the built-in 'Cora' dataset, a citation network where nodes are papers and edges are citations. Each node has a feature vector and a class label (the paper's subject).
Define the GNN Model
Create a Graph Neural Network model using PyTorch. This model will have two Graph Convolutional Network layers (`GCNConv`). The first layer performs message passing and updates node embeddings, followed by a ReLU activation. The second layer produces the final embeddings for classification.
Set Up Training
Instantiate the model, define an Adam optimizer, and specify the loss function (Negative Log-Likelihood Loss, suitable for our log_softmax output). The training function will perform a forward pass, calculate the loss on the training nodes, and update the model's weights.
Train and Evaluate the Model
Run the training loop for a set number of epochs. After training, switch the model to evaluation mode to get predictions on the test set. We calculate accuracy by comparing the predicted class labels with the true labels for the test nodes.

Starter code

import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv

# 1. Load the Cora dataset
try:
    dataset = Planetoid(root='/tmp/Cora', name='Cora')
except Exception as e:
    print(f"Failed to download dataset. You may need to install additional libraries like 'requests'. Error: {e}")
    exit()

data = dataset[0]

# 2. Define the Graph Neural Network model
class GCN(torch.nn.Module):
    def __init__(self, num_features, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(num_features, 16)
        self.conv2 = GCNConv(16, num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)

# 3. Setup for training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN(dataset.num_node_features, dataset.num_classes).to(device)
data = data.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

# Training function
def train():
    model.train()
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss.item()

# Evaluation function
@torch.no_grad()
def test():
    model.eval()
    out = model(data)
    pred = out.argmax(dim=1)
    correct = pred[data.test_mask] == data.y[data.test_mask]
    acc = int(correct.sum()) / int(data.test_mask.sum())
    return acc

# 4. Train and evaluate
print("Starting training...")
for epoch in range(1, 201):
    loss = train()
    if epoch % 20 == 0:
        test_acc = test()
        print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Test Accuracy: {test_acc:.4f}')

final_acc = test()
print(f'\nFinal Test Accuracy: {final_acc:.4f}')