Explainability

Overview

The SQuARE platform provides behavioral testing via Checklist. This is achieved by unit tests designed by the end-users or the system experts. The two most common types are Minimum Functionality Test (MFT) and INVariance (INV) as given in the table below.

Minimum Functionality Test (MFT)-Taxonomy	INVariance-Robustness
C: There is a tiny purple box in the room.	C: ...Newcomen designs had a duty of about 7 million, but most were closer to 5 million....
Q: What size is the box?	Q: What was the ideal duty->>udty of a Newcomen engine?
Test: Check if the prediction is tiny.	Test: Check whether the prediction changes or not.

MFTs are designed to measure a capability (e.g., Taxonomy capacity of matching object properties to categories) via specifying the expected behaviour (e.g., “tiny” in Table above). INVs tests are similarly refined for capabilities (e.g., robustness under spelling errors in question), however the expected behaviour is already known, i.e., the answer should remain the same.

Behavioral testing of skills

The users can choose the Skill they want to investigate from the drop-down menu. The `Show Checklist' button is activated once the predictions from the tests are saved in a JSON file.

skill-comp

Different tests are displayed success and failure rate of the skills. An exemplary visualization for testing of SQuAD skill is given in Figure below.

checklist

To analyze or process a Skill’s test performance in more detail, a full JSON report of all test examples can be downloaded using the Download all examples button.

To view the failed test cases in more detail, the user can click the Expand button. This allows the user to quickly identify changes the Skill could not handle.

checklist-examples

Currently supported skills

Name	Reader Model	Reader Adapter	Type	Code
BoolQ BERT Adapter	bert-base-uncased	boolq	categorical	code
BoolQ RoBERTa Adapter	roberta-base	boolq	categorical	code
CommonsenseQA BERT Adapter	bert-base-uncased	commonsense_qa	multiple-choice	code
CommonsenseQA RoBERTa Adapter	roberta-base	commonsense_qa	multiple-choice	code
CosmosQA BERT	bert-base-uncased	cosmos_qa	multiple-choice	code
CosmosQA RoBERTa Adapter	roberta-base	cosmos_qa	multiple-choice	code
DROP BERT Adapter	bert-base-uncased	drop	span-extraction	code
DROP RoBERTa Adapter	roberta-base	drop	span-extraction	code
HotpotQA BERT Adapter	bert-base-uncased	hotpotqa	span-extraction	code
HotpotQA RoBERTa Adapter	roberta-base	hotpotqa	span-extraction	code
MultiRC BERT Adapter	bert-base-uncased	multirc	multiple-choice	code
MultiRC RoBERTa Adapter	roberta-base	multirc	multiple-choice	code
NewsQA BERT Adapter	bert-base-uncased	newsqa	span-extraction	code
NewsQA RoBERTa Adapter	roberta-base	newsqa	span-extraction	code
QuAIL BERT Adapter	bert-base-uncased	quail	multiple-choice	code
QuAIL RoBERTa Adapter	roberta-base	quail	multiple-choice	code
QuaRTz RoBERTa Adapter	roberta-base	quartz	multiple-choice	code
Quoref BERT Adapter	bert-base-uncased	quoref	span-extraction	code
Quoref RoBERTa Adapter	roberta-base	quoref	span-extraction	code
RACE BERT Adapter	bert-base-uncased	race	multiple-choice	code
RACE RoBERTa Adapter	roberta-base	race	multiple-choice	code
SQuAD 1.1 BERT Adapter	bert-base-uncased	squad	span-extraction	code
SQuAD 1.1 RoBERTa Adapter	roberta-base	squad	span-extraction	code
SQuAD 2.0 BERT Adapter	bert-base-uncased	squad_v2	span-extraction	code
Social-IQA BERT Adapter	bert-base-uncased	social_i_qa	multiple-choice	code
Social-IQA RoBERTa Adapter	roberta-base	social_i_qa	multiple-choice	code

Check out these skills on the SQuARE platform.

Explainability

Overview​

Behavioral testing of skills​

Currently supported skills​

Overview

Behavioral testing of skills

Currently supported skills