Skip to main content

Explainability

Overview

The SQuARE platform provides behavioral testing via Checklist. This is achieved by unit tests designed by the end-users or the system experts. The two most common types are Minimum Functionality Test (MFT) and INVariance (INV) as given in the table below.

Minimum Functionality Test (MFT)-TaxonomyINVariance-Robustness
C: There is a tiny purple box in the room.
C: ...Newcomen designs had a duty of about 7 million, but most were closer to 5 million....
Q: What size is the box?
Q: What was the ideal duty->>udty of a Newcomen engine?
Test: Check if the prediction is tiny.
Test: Check whether the prediction changes or not.

MFTs are designed to measure a capability (e.g., Taxonomy capacity of matching object properties to categories) via specifying the expected behaviour (e.g., “tiny” in Table above). INVs tests are similarly refined for capabilities (e.g., robustness under spelling errors in question), however the expected behaviour is already known, i.e., the answer should remain the same.

Behavioral testing of skills

The users can choose the Skill they want to investigate from the drop-down menu. The `Show Checklist' button is activated once the predictions from the tests are saved in a JSON file.

skill-comp

Different tests are displayed success and failure rate of the skills. An exemplary visualization for testing of SQuAD skill is given in Figure below.

checklist

To analyze or process a Skill’s test performance in more detail, a full JSON report of all test examples can be downloaded using the Download all examples button.

To view the failed test cases in more detail, the user can click the Expand button. This allows the user to quickly identify changes the Skill could not handle.

checklist-examples

Currently supported skills

NameRetrieval ModelDatastoreReader ModelReader AdapterTypeCode
BoolQ BERT Adapterbert-base-uncasedboolqcategoricalcode
BoolQ RoBERTa Adapterroberta-baseboolqcategoricalcode
CommonsenseQA BERT Adapterbert-base-uncasedcommonsense_qamultiple-choicecode
CommonsenseQA RoBERTa Adapterroberta-basecommonsense_qamultiple-choicecode
CosmosQA BERTbert-base-uncasedcosmos_qamultiple-choicecode
CosmosQA RoBERTa Adapterroberta-basecosmos_qamultiple-choicecode
DROP BERT Adapterbert-base-uncaseddropspan-extractioncode
DROP RoBERTa Adapterroberta-basedropspan-extractioncode
HotpotQA BERT Adapterbert-base-uncasedhotpotqaspan-extractioncode
HotpotQA RoBERTa Adapterroberta-basehotpotqaspan-extractioncode
MultiRC BERT Adapterbert-base-uncasedmultircmultiple-choicecode
MultiRC RoBERTa Adapterroberta-basemultircmultiple-choicecode
NewsQA BERT Adapterbert-base-uncasednewsqaspan-extractioncode
NewsQA RoBERTa Adapterroberta-basenewsqaspan-extractioncode
QuAIL BERT Adapterbert-base-uncasedquailmultiple-choicecode
QuAIL RoBERTa Adapterroberta-basequailmultiple-choicecode
QuaRTz RoBERTa Adapterroberta-basequartzmultiple-choicecode
Quoref BERT Adapterbert-base-uncasedquorefspan-extractioncode
Quoref RoBERTa Adapterroberta-basequorefspan-extractioncode
RACE BERT Adapterbert-base-uncasedracemultiple-choicecode
RACE RoBERTa Adapterroberta-baseracemultiple-choicecode
SQuAD 1.1 BERT Adapterbert-base-uncasedsquadspan-extractioncode
SQuAD 1.1 RoBERTa Adapterroberta-basesquadspan-extractioncode
SQuAD 2.0 BERT Adapterbert-base-uncasedsquad_v2span-extractioncode
Social-IQA BERT Adapterbert-base-uncasedsocial_i_qamultiple-choicecode
Social-IQA RoBERTa Adapterroberta-basesocial_i_qamultiple-choicecode

Check out these skills on the SQuARE platform.