Kernel crash #56

gcalabria · 2022-11-10T14:42:45Z

I have tried to run the vaswani.ipynb notebook in two different machines, but in both cases the kernel crashed in the same point. This is the log that I can see before the crash:

vaswani documents:   0%|          | 0/11429 [00:00<?, ?it/s]

[Nov 10, 15:37:24] [0] 		 #> Local args.bsize = 128
[Nov 10, 15:37:24] [0] 		 #> args.index_root = ../content
[Nov 10, 15:37:24] [0] 		 #> self.possible_subset_sizes = [69905]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing ColBERT: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias']
- This IS expected if you are initializing ColBERT from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ColBERT from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ColBERT were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['linear.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

[Nov 10, 15:37:30] #> Loading model checkpoint.
[Nov 10, 15:37:30] #> Loading checkpoint http://www.dcs.gla.ac.uk/~craigm/colbert.dnn.zip

/home/s7949670/.local/lib/python3.8/site-packages/torch/hub.py:452: UserWarning: Falling back to the old format < 1.6. This support will be deprecated in favor of default zipfile format introduced in 1.6. Please redo torch.save() to save it in the new zipfile format.
  warnings.warn('Falling back to the old format < 1.6. This support will be '

[Nov 10, 15:37:46] #> checkpoint['epoch'] = 0
[Nov 10, 15:37:46] #> checkpoint['batch'] = 44500




[Nov 10, 15:37:53] #> Note: Output directory ../content already exists




[Nov 10, 15:37:53] #> Creating directory ../content/colbertindex 


vaswani documents: 100%|██████████| 11429/11429 [00:28<00:00, 399.40it/s]

[Nov 10, 15:38:04] [0] 		 #> Completed batch #0 (starting at passage #0) 		Passages/min: 61.2k (overall),  62.0k (this encoding),  12955.9M (this saving)
[Nov 10, 15:38:04] [0] 		 [NOTE] Done with local share.
[Nov 10, 15:38:04] [0] 		 #> Joining saver thread.
[Nov 10, 15:38:04] [0] 		 #> Saved batch #0 to ../content/colbertindex/0.pt 		 Saving Throughput = 3.3M passages per minute.

#> num_embeddings = 581496
[Nov 10, 15:38:04] #> Starting..
[Nov 10, 15:38:04] #> Processing slice #1 of 1 (range 0..1).
[Nov 10, 15:38:04] #> Will write to ../content/colbertindex/ivfpq.100.faiss.
[Nov 10, 15:38:04] #> Loading ../content/colbertindex/0.sample ...
#> Sample has shape (29074, 128)
[Nov 10, 15:38:04] Preparing resources for 1 GPUs.
[Nov 10, 15:38:04] #> Training with the vectors...
[Nov 10, 15:38:04] #> Training now (using 1 GPUs)...
0.4895319938659668
23.038629055023193
0.00026726722717285156
[Nov 10, 15:38:28] Done training!

[Nov 10, 15:38:28] #> Indexing the vectors...
[Nov 10, 15:38:28] #> Loading ('../content/colbertindex/0.pt', None, None) (from queue)...
[Nov 10, 15:38:28] #> Processing a sub_collection with shape (581496, 128)
[Nov 10, 15:38:28] Add data with shape (581496, 128) (offset = 0)..
  IndexIVFPQ size 0 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=1 reserveVecs=33554432

Does anyone has an idea of what the problem might be?

The text was updated successfully, but these errors were encountered:

cmacdonald · 2022-11-10T14:52:29Z

This is almost certainly a FAISS version problem. Ensure you use conda to install it, and try version 1.6.5 or 1.6.3. Often the Conda doesnt have good matches for particular python and cuda versions.

gcalabria · 2022-11-14T11:47:46Z

Thanks for the quick answer. I will try using conda, I was using pip previously

cmacdonald · 2022-11-14T12:00:47Z

Yes pip is not the solution - see https://github.com/terrierteam/pyterrier_colbert#installation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel crash #56

Kernel crash #56

gcalabria commented Nov 10, 2022

cmacdonald commented Nov 10, 2022

gcalabria commented Nov 14, 2022

cmacdonald commented Nov 14, 2022

Kernel crash #56

Kernel crash #56

Comments

gcalabria commented Nov 10, 2022

cmacdonald commented Nov 10, 2022

gcalabria commented Nov 14, 2022

cmacdonald commented Nov 14, 2022