AMED Data Utilization Platform User Manual

This manual describes the operations of the AMED Data Utilization Platform. The AMED Data Utilization Platform can be accessed from the following URL.
https://prod-www.cannds.amed.go.jp/web/login/init

*In order to use the Computer Systems on Collaborative Centers and Genotype Imputation features described in this manual, it is needed to aceess from a separate URL than the URL listed above. Please contact us at the following email address for details on the URL for accessing the features above.
Email Address: cannds"AT"amed.go.jp （Replace "AT" with "@".）

1.2 Screen Configuration

Each function consists of the following screens. For operating procedures for each screen, see the relevant page.

* Use of Computer Systems on Collaborative Centers and Genotype Imputation are not available simply by creating a login account.
(These features become available after screening data use.)
Therefore, check the Help & Contact screen after screening and logging in for information about how to use these features.

1.3 Session Timeout

Session information will be discarded after an hour of inactivity after login.
If the session times out, login again to use the platform.

1.4 Connecting With the Supercomputer System

The AMED Data Utilization Platform controls whether you can connect to the supercomputer system based on your operating environment.
As shown in the diagram below, Connections to the supercomputer system are available only through the collaborative computer systems.*1
When connecting over the Internet, the [Use of Computer Systems on Collaborative Centers] and [Genotype Imputation] buttons or links will be hidden from view and inaccessible. (Attempting to directly access the URL will result in a system error.)
*1 Please confirm the URL through the collaborative computer systems with your system administrator.

2 Login

2.1 Login

The AMED Data Utilization Platform uses two login authentication methods.
One way is to log in using the authentication system used by the research institution the user belongs to via an authentication platform called GakuNin (https://www.gakunin.jp/). The other is to log in using the login ID issued by the AMED Data Utilization Platform, SecurID.
The authentication system used is set when creating your account. Log into your account using the method specified by AMED. (The authentication method is fixed for each user. You cannot log in using a method other than that specified.)
When logging in using SecurID, the system will behave differently depending on whether this is your first time logging in.

■ Logging in via GakuNin
-> See 2.2 Logging in via GakuNin.
■ Logging In With SecurID (First-Time Login)
-> See 2.3 Logging In With SecurID (First-Time Login).
■ Logging In With SecurID (Subsequent Logins)
-> See 2.4 Logging In With SecurID (Subsequent Logins).

2.2 Logging in via GakuNin

2.2.1 Select Your Institution

Select the institution you belong to.
When selecting an institution, the login screen for the selected institution will appear.

2.2.2 Authentication by Your Institution

On the login screen of the selected institution, perform the login authentication process.
If the institution does not require two-factor authentication on the AMED Data Utilization Platform, login is successful when this authentication process succeeds, and the Dashboard (TOP) screen appears.
When performing authentication with another institution, if this authentication process succeeds you will be taken to the Enter Email Address screen or the Enter One-Time Password (OTP) screen.

(The Orthros authentication screen is showed here as an example. The actual screen layout and entry fields will vary depending on the institution selected.)

2.2.3 Enter Email Address

When logging into AMED Data Utilization Platform for the first time, the Enter Email Address screen may appear.
Depending on the settings of your institution, it may not be displayed.

When this screen appears, please enter the e-mail address you submitted to AMED Data Utilization Platform office when applying for use.
This screen is displayed only for the first login, it is not displayed for the second and subsequent logins.

Enter your e-mail address and click the Send button. You will be taken to the Enter One-Time Password (OTP) screen.

2.2.4 Enter One-Time Password

After successfully passing authentication procedures for institutions that requires two-factor authentication, you will be sent an email containing the one-time password to the registered email address. If authentication is successful after entering the one-time password, the Dashboard (TOP) screen will appear.

2.3 Logging In With SecurID (First-Time Login)

When logging in for the first time, you will be asked to register a passcode number.
Enter your login ID and the initial passcode number provided to open the Register Passcode Number screen.
Register your own passcode number. * Enter the passcode number registered for subsequent login attempts.
If authentication fails on any screen, you will be returned to the Login screen (Enter ID).

2.3.1 Login (Enter ID)

Enter your login ID and click the [LOGIN] button.
If authentication is successful, the Enter Passcode Number screen will appear.

2.3.2 Login (Enter Passcode Number)

After entering in the login ID, the initial passcode number authentication process will begin.
If authentication is successful, the Register New Passcode Number screen will appear.

Once the new passcode number is registered, enter the passcode number again.
*Enter the passcode number registered, not the initial passcode number.
If authentication is successful, the Enter One-Time Password screen will appear.

2.3.3 Login (Register a New Passcode Number)

Enter the passcode number to register, and then click the [REGISTER PERSONAL IDENTIFICATION NUMBER (PIN)] button.
If registration is successful, the Enter Passcode Number screen will appear.
Passcode numbers must satisfy the following requirements to be registered.
・Be 4 to 8 digits long, containing half-width numerals.
・Not match the three most recent passcode numbers used.

2.3.4 Login (Enter One-Time Password)

An email containing the one-time password will be sent to the registered email address.
If authentication is successful, the Dashboard (TOP) screen will appear.
*If you wish to change your email address, please contact your system administrator.

2.4 Logging In With SecurID (Subsequent Logins)

Enter the passcode number registered when logging in for the first time for subsequent login attempts.
Complete the authentication process using the one-time password sent to the email address registered with the system.
*If you forget your passcode number, please contact your system administrator to reset your passcode number.

2.4.1 Login (Enter ID)

Enter your login ID and click the [LOGIN] button.
If authentication is successful, the Enter Passcode Number screen will appear.

2.4.2 Login (Enter Passcode Number)

Enter the passcode number registered when first logging in, and then click the [SEND PERSONAL IDENTIFICATION NUMBER (PIN)] button.
If authentication is successful, the Enter One-Time Password screen will appear.

2.4.3 Login (Enter One-Time Password)

Complete the authentication process using the one-time password sent to the email address registered with the system.
Enter the one-time password, and then click the [SEND ONE-TIME PASSWORD(OTP)] button to login.
If authentication is successful, the Dashboard (TOP) screen will appear.

3 Common Features

3.1 Header and Menu

Each screen features a common header and menu.

3.1.1 Header

You can perform the following actions from the header.
・Display the main page of the Japan Agency for Medical Research and Development website in a separate tab.
・Open and close the menu bar.
・Connect to the integrated supercomputer system (only available with a URL connection thorough the collaborative computer systems).
・Switch the language display between Japanese/English
・Display a link to the User Details screen.
・Display the [Logout] button.

3.1.2 Connecting to the Supercomputer

Use this to connect to the supercomputer system.
This option is displayed or hidden based on your connection environment.
・This is displayed when connecting to a URL through the collaborative computer systems.
・This is hidden when connecting to an Internet connection URL.

3.1.3 JA/EN Switch

You can switch the screen display between Japanese and English.
* Note that the language will only be changed for labels and other fixed text. Data registered in Japanese will still appear in Japanese, even if the screen display language is changed to English.

3.1.4 Logout

Click the [Logout] button to logout. Logout after your work is complete.

3.1.5 Menu

You can jump to any screen from the menu.

See below for details on each screen.
・4 Dashboard (TOP)
・5.1 Search Metadata
・6.1 Allele Frequency Data
・6.4 Generate Allele Frequency Data
・6.5 Downloads
・6.6 Data Management
・6.7 Docker Environments
・6.8 Setup Docker Environment
・7.1 Project Information
・8.1 News
・8.2 Help & Contact

3.2 Breadcrumbs List

A [breadcrumbs list] showing page navigation is located at the top of each screen to show which screen is currently displayed. Click a link in the [breadcrumbs list] to jump to the corresponding screen.

List of link destinations

Screen name	Link name	Link destination
Common to all screens	TOP	Dashboard (TOP) screen
Search Metadata screen	Search Metadata	Search Metadata screen
Metadata Search Results screen	Search Metadata	Search Metadata screen
Metadata Details screen (shared data for each)	Search Metadata	Search Metadata screen
Metadata Details screen (shared data for each)	Metadata Search Results	Metadata Search Results screen
Metainformation (Analysis without SAMPLE_REF registration) screen	Search Metadata	Search Metadata screen
Allele Frequency Data screen	Pre-Research	Allele Frequency Data screen
Variant List screen	Pre-Research	Allele Frequency Data screen
Variant List screen	Allele Frequency Data	Allele Frequency Data screen
Variant Details screen	Pre-Research	Data Management screen
	Allele Frequency Data	Allele Frequency Data screen
	Variant List	Variant List screen
Generate Allele Frequency Data screen	Pre-Research	Allele Frequency Data screen
Downloads screen	Pre-Research	Allele Frequency Data screen
Data Management screen	Pre-Research	Allele Frequency Data screen
Docker Environments screen	Pre-Research	Allele Frequency Data screen
Setup Docker Environment screen	Pre-Research	Allele Frequency Data screen
About Workflow screen	Genotype Imputation	About Workflow screen
Job Configuration screen	Genotype Imputation	About Workflow screen
Job List screen	Genotype Imputation	About Workflow screen
Project List screen	Project Information	Project List screen
Project Details screen	Project Information	Project List screen
Project Details screen	Project List	Project List screen
News screen	News & Help	News screen
Help & Contact screen	News & Help	News screen

3.3 Sort and Pager

The Metadata Search Results list and Project List use a sort function and pager function.

3.4 User Details

Click the [User Details] button in the header to jump to the User Details screen. You can view details on the login user and upload a profile image on the User Details screen. Uploaded image files will appear in the header. The following restrictions apply when uploading files.

Item name	Restriction
File size	2 MB or less
File extensions	Only .jpeg, .jpg, and .png file extensions accepted
Vertical size of file	560 px
Horizontal size of file	420 px

The registered profile image will appear in the header.

4 Dashboard (TOP)

This screen will appear when first logging on, and when clicking the [TOP] link on the menu.
[1]Area for emergency news

-> See 4.1 Emergency News.

[2]Area for information on news of planned shutdowns and other news

-> See 4.2 News of Planned Shutdowns and Other News.

[3]Area for displaying Metadata Summary Information

-> See 4.3 Metadata Summary Information.

[4]Area for displaying the Saved Search Conditions List

-> See 4.4 Saved Search Conditions List.

[5]Area for displaying the Project List

-> See 4.5 Project List.

4.1 Emergency News

This is used by AMED to issue urgent news to users, such as in the event of a system failure.

4.2 News of Planned Shutdowns and Other News

This is used by AMED to issue news to users on planned service shutdowns for system maintenance, and other issues.
Content displayed is the same as that for the Emergency News display area described in 8.1 News.

4.3 Metadata Summary Information

This displays aggregate graphs of the number of registered Metadata entries, "Diseases Top 10", "Gender", "Region", and "Age", and aggregate tables by Metadata subject, and by disease name.
Click [Total number of registered Studies] to display a list of subjects as a pop-up.
Click on an element in the graph or the aggregate table link to perform a Metadata Search using the selected element as a Search Condition.

Aggregate table by disease name

Subject list modal

Subject details modal

4.4 Saved Search Conditions List

This displays a list of saved Metadata Search Conditions.
If the Search Condition History is not found, a [No relevant data found.] message will appear.
* For more information on saving Metadata Search Conditions, see 5.1.5 Search Using Saved Search Conditions

4.5 Project List

This displays a list of research projects that the user is involved with.
The contents shown is the same as that described in 7.1 Project List.

5 Metadata

5.1 Search Metadata

Enter Search Conditions to search for Metadata.
The following three search methods are available.
・Searching by entering search keywords.
・Searching previously searched Metadata from saved Search Conditions.
・Searching previously searched Metadata from Search Condition History.
・Only displaying analysis data not tied to a sample.

Data Provider Institution

When detailed conditions are displayed

5.1.1 Search Target Metadata

Metadata handled by the system include the following five types of data according to the JGA data model.

Item name	Summary	Details
Study	Research information	Data in top-level objects, including study content, research expenses, and publication information After the data is provided, it is shared publicly to provide an overview of research studies
Sample	Information per sample	Sample ≒ Individual Phenotype data (gender, age, etc.), and anonymous donor IDs
Experiment	Experiment analysis information	Experiment procedures, questionnaires, library information, sequencers, etc. A single sample is linked to multiple data objects
Data	Genome data information	Stores (raw) data files (fastq, bam, array data) on individuals
Analysis	Data analysis information	Stores analysis data of multiple data or sample types Example: Charts summarizing variant data (vcf) and phenotype data

5.1.2 Keyword Search (Synonyms)

This extracts data for any search target entered as a keyword in the top box.
This excludes data as a target for extraction for any search target entered as a keyword in the bottom box.
Example: When searching for the keyword ”cancer” by entering the word in the top box, data containing the word “cancer” in any search target, such as the title or abstract of a study, will be retrieved.
When entering the keyword ”cancer” into the bottom box, data containing the word “cancer” in any search target, such as the title or abstract of a study, will not be retrieved.
* For details on search target items in a keyword search, see 9.1 Search Target Items in a Keyword Search

If the [Search including synonyms] check box is selected, synonyms of the keyword entered will also be retrieved in a search.
Example: When entering the keyword “Alzheimer’s disease,” data corresponding to synonym keywords such as “Alzheimer’s” and “Alzheimer dementia” will also be retrieved.
If you do not wish to include synonyms in the search results, uncheck the [Search including synonyms] check box before performing your search.
Example: Entering the keyword “Alzheimer’s disease” and unchecking the [Search including synonyms] check box before performing the search will limit search results to data matching the keyword “Alzheimer’s disease,” without retrieving data containing keywords such as “Alzheimer’s” and “Alzheimer dementia.”

5.1.3 Keyword Suggestions

A list of disease names that partially match the keyword entered in the keyword field will appear as suggested keywords.

5.1.4 Check the Number of Search Results

Click the [CHECK THE NUMBER OF SEARCH RESULTS] button to check the number of Metadata entries matching the Search Conditions entered.

5.1.5 Search Using Saved Search Conditions

You can perform a search using previously saved Search Conditions.
[Number of Search Results] shows the number of matches retrieved when the search was previously saved. If Metadata was added or deleted since the search was performed, the number of matches shown may not match the current number of matches in search results.
Click the [DOWNLOAD THE SEARCH CONDITIONS] button to download the file in JSON format.
Click the [DELETE] button to delete saved Search Conditions.
* For more information on saving Search Conditions, see 5.2.1 Save Search Conditions

5.1.6 Search Condition History Search

You can perform searches using previously entered Search Conditions (five most recent Search Conditions).
Number of Search Results shows the number of matches retrieved when the search was previously performed. If Metadata was added or deleted since the search was performed, the number of matches shown may not match the current number of matches in search results.

5.1.7 Searching Analysis Data Not Tied to a Sample

This displays a list of data without a “SAMPLE_REF” entry in Analysis data.

5.2 Metadata Search Results

The Search Metadata screen displays a list of samples tied to extracted data.
Example: When searching for the keyword ”cancer,” matching data samples will appear when the keyword appears in the title, abstract, or other part of a study.

5.2.1 Save Search Conditions

Click the [SAVE THE SEARCH CONDITIONS] button to save up to 10 Search Conditions for Search Conditions that are used repeatedly, such as keywords that are frequently searched for. The saved results appear on the Dashboard screen and the Search Metadata screen, and can be used to perform searches with the same Search Conditions.

5.2.2 Expand Synonyms

When performing a searching with the [Search including synonyms ] check box selected as described in 5.1.2 Keyword Search (Synonyms), you can check synonyms.
You can also search any synonym to search again.
Example: When entering the keyword “Alzheimer’s disease,” keywords such as “Alzheimer’s” and “Alzheimer dementia” will also be used as synonyms. To exclude “Alzheimer dementia” from the search results retrieved, uncheck the “Alzheimer dementia” check box and perform the search again to only retrieve data for “Alzheimer’s disease” and “Alzheimer’s.”

The screenshot below shows an expanded view.

5.2.3 Display Graph

This displays a graph that aggregate the Search Results.

You can click on an element in the graph to refine Search Results.
The Search Condition will be overwritten when the same condition is clicked in the graph.
(For check boxes, a search will be performed as though all conditions other than the selected condition have not been checked)
Example: When clicking Kanto in the Region graph, Kanto will be the only Region Search Condition selected, and a search will be performed with all check boxes other than Kanto unchecked.
Click the [Re-display of initial search results] button to redisplay the initial results before Search Results were refined.

5.2.4 Download Metadata Search

You can download Metadata Search Results.
Click the [GENERATE DOWNLOAD DATA] to begin downloading the Metadata Search Results currently displayed, and display a message prompt to accept data generation on the Search Metadata screen.
* Only one file can be generated at a time. The [GENERATE DOWNLOAD DATA] will appear inactive and cannot be clicked while a file is being generated (or has already been generated). If you wish to download another set of Search Results, either cancel the data generation process if it is in progress, or delete the previously generated data on the Search Metadata screen.

Files are generated in the order the system receives generate file requests. Once the file is generated, it can be downloaded from the Search Metadata screen.
* You can confirm the status of download files being generated by clicking the [Click here to view the status] link on the Search Metadata screen.

If the download file generation has not started, you can click the [CANCEL] button to cancel the generate download data process.
* Once the system starts generating a download file, the [CANCEL] will be hidden from view, and you will not be able to cancel the process.

Once the download file has finished generating, a task complete message will appear. Click the [DOWNLOAD] button to download the file. If the file is no longer needed, click the [DELETE] button to delete the file.
* If you want to generate a new download file, delete the previous file first, and then generate the new file.

5.3 Show Metadata Details - Sample

The Metadata Search Results screen displays sample data tied to the data selected.
*Sample is a general term used to refer to data containing anonymized information about a subject used for research purposes.

5.4 Show Metadata Details - Experiment

The Metadata Search Results screen displays experiment data tied to the data selected.
*Experiment is a general term used to refer to data containing information on experiment procedures, questionnaires, library information, sequencers, and other experiment information used in research.

5.5 Show Metadata Details - Study

The Metadata Search Results screen displays study data tied to the data selected.
*Study is a general term used to refer to research information, including the content of studies, research expenses, and publication information.

5.6 Show Metadata Details - Data

The Metadata Search Results screen displays study data tied to the data selected.
*Data is a general term used to refer to data containing raw data information on experiment results for a specific experiment.

5.7 Show Metadata Details - Analysis

The Metadata Search Results screen displays study data tied to the data selected.
*Analysis is a general term used to refer to data containing analysis result information, including experiment result analysis and sample information analysis.

5.8 Show Metadata Details - Analysis (No Association)

Here you can check a list of analysis data entries not tied to a particular sample. This screen can be reached by clicking the [Click here for Analysis data not associated with Sample (without SAMPLE_REF registration)] link on the Search Metadata screen.

6 Pre-Research

6.1 Allele Frequency Data

A list of registered allele frequency data will be displayed. You can confirm the execution status of jobs registered on the Create Allele Frequency Data screen.
Additionally, you can open the Variant List screen to confirm details of created allele frequency data.
You can also delete allele frequency data that is no longer needed.

6.1.1 View Sample Sets

The list contains allele frequency data that comes pre-registered as presets, and allele frequency data registered by the user.
・ Presets appear with the name “CANNDs_23k.” Due to the large volume of data, the Variant List screen can take a while to display.
・ Files are automatically deleted 30 days after they are generated. Following this, they will no longer appear on the Variant List screen. However, file conditions can still be confirmed.

6.1.2 Confirm Execution Status

The Status column in the list updates according the processing status on the supercomputer. When the status changes to “Registered,” the Variant List screen can be displayed.

6.1.3 Using Allele Frequency Data

You can confirm the extraction conditions at the time of registration, and delete any unnecessary data using [Data Delete] or [Delete From List].
・ [Confirm]: This displays the extraction conditions. Click the [Modify conditions and register data] button on the View Conditions screen to jump to the Generate Allele Frequency Data screen with the conditions set to perform generate allele frequency data again.
・ [Data Delete]: While this shows extraction conditions and other allele frequency data, you will not be able to jump to the Variant List screen.
・ [Delete From List]: This deletes all allele frequency data and removes the allele frequency data from the list.

6.1.4 Preset Data

Click the [List] button for the preset data to display the following dialog box.
Due to the extremely large volume of preset data, you must specify conditions to refine the scope of data retrieved before opening the Variant List.
After specifying conditions, click either [OPEN IN CURRENT TAB] or [OPEN IN A NEW TAB] to display the Variant List screen.

Enter three characters or more of the gene to display suggestions.

You can select multiple options from the drop-down list.
Each Search Condition is applied using the AND condition. Multiple values selected from the drop-down list are searched using the OR condition.

6.2 Variant List

This shows a list of variants for the allele frequency data selected on the Allele Frequency Data screen.
・If a variant does not exist in a biobank, the Alt allele frequency will display "N/A".
・If a variant exists in a biobank, but the number of its samples is too small to reach the minimum number needed for frequency information, the Alt allele frequency will display "N/A(※)".

6.2.1 Variant Filter

On the Variant List screen, you can apply filters to narrow down and sort data.
You can specify multiple filters, and you can refine searches using multiple filters using the AND condition.
Filters can be applied by either specifying filters by keyword, selecting filters from the drop-down list, or by using expressions.
To specify filters by keyword, either enter the filter in the Keyword field and press the ENTER key, or click the filtering icon.

For items specified by keyword, enter three characters or more of the gene to display suggestions.

You can select multiple options from the drop-down list.
Once selected, click the filtering icon to filter the data.
Each filter is applied using the AND condition. Multiple values selected from the drop-down list are filtered using the OR condition.

To apply filters using expressions, select the numerical symbol from the drop-down list, and then input the value in the Keyword field.
Press the ENTER key, or click the filtering icon in the Keyword field to filter variants by a expression of the “selected numerical symbol” + “input value”.
For example, if “≧” is specified as the numerical symbol, and “0.0001” is specified in the Keyword field, the expression becomes “≧0.0001”, and variants with an allele frequency of 0.0001 or greater will be retrieved.

By default, results are sorted in ascending order by Region. Click the filter item header to sort in ascending order for the item.
Click items sorted in ascending order again to sort them in descending order. Clicking the item once more will revert to sorting the item in ascending order again.

6.3 Variant Details

This shows details of variants selected on the Variant List screen.

The preset data provides stratified allele frequency by age group, gender, residence or birthplace, and disease name. (However, data on allele frequency is not provided where there are less than 100 samples).
Under Frequency at the bottom of the screen, click the “Disease” “ Gender,” “Age,” “Region,”, or “Genetic Ancestry Group” buttons to show/hide subsets other than Overall.

When Disease results are shown, you can filter results by a partial match of the disease name.

6.4 Generate Allele Frequency Data

Register allele frequency data that conditions have been specified for to create data on the supercomputer.
Enter in the fields marked [1] to [4] in the screenshot below and confirm the details provided in [5]. Next, click the [ADD] button to display the Name dialog box.
Register a dataset name of your choosing in the dialog box to send a processing request to the supercomputer.
Up to three allele frequency datasets can be created at once. You cannot create a new allele frequency dataset while a dataset is being processed. A maximum of 10 allele frequency datasets can be registered. To register additional datasets, delete an allele frequency dataset that is no longer required.
For details on how to confirm the status of supercomputer processes, and how to view the extraction results, see 6.1 Allele Frequency Data.

[1] Specify the data provider for the data being created. You must specify a data provider.
　As allele frequency data is created for each data provider, data files will be created for the specified data provider.
[2] Specify the diseases to be included in data to be created in Disease Names To Be Included, and diseases to be excluded in data to be created in Disease Names To Be Excluded.
　If attributes are not set in [3], disease names to be included and disease names to be excluded must be specified.
For details on how to specify diseases, see 6.4.1 Disease Setting
[3] Specify the attributes of data to be created. If the diseases are not set in [2], attribute settings must be specified.
[4] Specify the areas of data to be created. The area setting is required.
For details on how to specify an area, see 6.4.2 Area Setting
[5] When specifying the conditions for data to be created, the number of matching samples appears at the bottom of the screen. Allele frequency data cannot be created for data providers specified in [1] with fewer than 10 samples. Change the conditions set so that there are 10 or more samples.

6.4.1 Disease Setting

Search and specify diseases either by [SEARCH BY KEYWORDS] or [SEARCH BY DISEASE CLASSIFICATION].
[1] Search by Keywords
Use the same synonym search method described in 5.1.2 Keyword Search (Synonyms) to search for disease names and specify diseases.

[2] Search by Disease Classification
Select the category of disease you wish to specify from those available, and then specify the disease.

6.4.2 Area Setting

Specify one of the following from [1] to [4] below as the range for data creation.
[1] Gene specification: Specify the gene name.
[2] refSNP specification: Specify the dbSNP rsID.
[3] Position specification: Specify 22:111770161, etc., chrom + half-width colon + position .
[4] Region specification: 10:71510986-71617219, etc. chrom + half-width colon + position(start) + “-“ position(end).
　* When using Region specification, set start-end to within 100 kbp.

Only values registered to CANNDs as preset data can be specified.
As allele frequency data is created by extracting variants matching the specified conditions from the preset data, values not included in the preset data cannot be specified.

6.5 Downloads

Certain Sites-Only VCF files used in preset allele frequency data can be downloaded.
You must agree to the terms and conditions to download these files.
Click the “Terms of Use” link and read through its contents.

To agree to its contents, select the “I have read and understand the Terms of Use.” check box.
Once this check box is selected, the files will be available to download.

6.6 Data Management

You can upload file for analysis in the Docker environment, and download files created in the Docker environment.
[1] Copy Allele Frequency Data as User File (VCF)... Copy the data file created on the Create Allele Frequency Data screen into storage where it can be viewed from the Docker environment. Select the check box next to files you want to copy, and then click the [COPY] button.
[2] User Files... Files stored in storage where they can be viewed from the Docker environment are displayed in a list. Click the [Download] button to download the file. Any file that is no longer need can be deleted by clicking the [Delete] button.
[3] Upload User File... Select the data file you wish to store in storage where it can be viewed from the Docker environment, and then click the [UPLOAD] button.
[4] Docker Images ... Uploaded Docker image files are displayed in a list. Any image that is no longer need can be deleted by clicking the [Delete] button.
[5] Upload Docker Image... Select the Docker image file that you want to execute on the Create Docker Environment screen, and then click the [UPLOAD] button.

6.6.1 Copy Allele Frequency Data as User File (VCF)

Copy the data (VCF file) created in Create Allele Frequency Data into storage where it can be viewed from the Docker environment.
This will display allele frequency data created in “Copy Allele Frequency Data as User File (VCF)”. Select the check box next to the file you wish to register, and then click the [COPY] button.

Files are copied for each data provider specified when created allele frequency data. After copying is complete, the copied file name will be displayed as a message.

The copied files appear in “User File”“s”, and can be subsequently referenced from storage in the Docker environment.
You can download the allele frequency data you have created in the same way by copying it, and then downloading it from the “User File”s.

To confirm details of allele frequency data, click the [List] button to jump to the Variant List screen for the corresponding data, and then confirm the data contents.
To confirm the conditions used when creating allele frequency data, click the [Confirm] button to view the conditions set when creating the allele frequency data.

6.6.2 User Files

Files stored in storage where they can be viewed from the Docker environment are displayed in a list.
You can download or delete the files displayed in the list.
・ Data files are deleted automatically if they are not used for a 90-day period.

Hidden attribute files with a file name that begins with a dot are normally hidden from view.
Select the “Show hidden attribute files” to display hidden attribute files that are hidden from view in a list.

6.6.3 Upload User File

You can upload up to 10 GB of files to storage where they can be viewed from the Docker environment.

The following restrictions apply when uploading files.

Item name	Restriction
File size	2 GB or less
Overall file size	The total disk space available is 10 GB or less
File names	File name must only contain the following characters ・ Half-width alphanumeric characters ［0-9］［a-z］［A-Z］・ Half-width exclamation marks ［!］・ Half-width hyphens ［-］・ Half-width underscores ［_］・ Half-width periods ［.］　(Half-width periods cannot be used at the start of the file name) ・ Half-width at signs ［@］・ Half-width equal signs ［=］・ Half-width commas ［,］・ Half-width carets ［^］・ Half-width sharp marks ［#］・ Half-width round parentheses ［()］・ Half-width curly brackets ［{}］・ Half-width square brackets ［[]］

When uploading files, the following screen will open in a new window.
Do not close or refresh this screen until the upload is complete.

When the file has finished uploading, the screen will update as shown below.
Click the [CLOSE] button to close the screen.

6.6.4 Docker Images

This displays a list of user-uploaded Docker images.
You can download or delete the files displayed in the list.
* Docker images are deleted automatically if they are not used for a 90-day period.

6.6.5 Upload Docker Image

You can upload up to five user-created Docker images.
Uploaded image files will appear in the “Docker Images List”, and can be executed from the Create Docker Environment screen.
・ Uploaded image files are deleted automatically if they are not used for a 90-day period.
・ User-created Docker image files may not work properly if they do not follow the rules for creating Docker image files set for the AMED Data Utilization Platform. For more information on the rules on creating image files, see 9.2 Procedure for Using User-Created Docker Image Files.

The following restrictions apply when uploading files.

Item name	Restriction
File size	2 GB or less
File extensions	Extensions are confined to .tar and .tar.gz
File names	File name must only contain the following characters ・ Half-width alphanumeric characters ［0-9]［a-z]［A-Z］・ Half-width exclamation marks ［!］・ Half-width hyphens ［-］・ Half-width underscores ［_］・ Half-width periods ［.］　(Half-width periods cannot be used at the start of the file name) ・ Half-width at signs ［@］・ Half-width equal signs ［=］・ Half-width commas ［,］・ Half-width carets ［^］・ Half-width sharp marks ［#］・ Half-width round parentheses ［()］・ Half-width curly brackets ［{}］・ Half-width square brackets ［[]］

When uploading files, the following screen will open in a new window.
Do not close or refresh this screen until the upload is complete.

When the file has finished uploading, the screen will update as shown below.
Click the [CLOSE] button to close the screen.

6.7 Docker Environments

This displays a list of Docker environments registered on the Setup Docker Environment screen.
Use this to view the startup status of Docker environments, access the Jupyter Notebook, and restart Docker environments that have completed tasks for execution.
You can also delete Docker environments that are no longer needed.

6.7.1 Using Docker Environments

・ When starting a template image, the [Access Notebook] button will appear. Click this button to access the Jupyter Notebook running in another tab to make changes to the Docker.
・ If the Docker environment is running, the [STOP] button will appear. Click this button to stop the Docker environment.
・ If the Docker environment is stopped, the [START] button will appear. Click this button to restart the Docker environment.
・ If the Docker environment is stopped, the [Delete] button will appear. Click this button to delete the Docker environment.
・ In the Container Log column, click the [Confirm] button to view the container log when the Docker environment is executed.
　Logs can only be viewed when the Docker environment is stopped.

6.7.2 View Analysis Environments

To check the results of analysis performed in the Docker environment, either output a log, or save the analysis results as a file in storage.
For more information about storage destinations, see “9.2.2 Uploading and Downloading Data for Analysis”.
You can download files stored in storage from the Data Management screen (see “6.6 Data Managemen”“”.

6.7.3 Notes When Using Jupyter Notebook

For information on notes to consider when using Jupyter Notebook, see 9.3 Procedure for Using Template Docker Image Files.

6.8 Setup Docker Environment

Specify a Docker image and start the Docker environment.
Enter the fields [1] through [6] described below, and then click the [START] button to start the Docker image in the Docker environment.
* Up to 4 vCPU is available for the CPU and 28 GB is available for memory at the same time. When running multiple instances of the Docker environment at the same time, take care to avoid exceeding these limits.
For information on how to view the startup status of the Docker environment, and accessing Notebook while running, see 6.7 Docker Environments.

[1] Enter the name of the Docker environment.
[2] Specify a template image or a user-created image as the Docker Execution Image.
・ Template: Image used to start jupyter notebook. You can access the Notebook started from your browser.
・ User-created: A user-created image registered on the Data Management screen (see 6.6.5 Upload Docker Image)
[3] When using a user-created image, enter the user name used in the user-created image.
For the Container User Name, enter a user name other than “root,” “sys,” or “adm.”
[4] Select the CPUs (1 to 4) used in the Docker environment.
[5] Select the amount of memory (1 GB to 28 GB) used in the Docker environment.
[6] Select the Auto Stop Time (10 minutes to 300 minutes) for the Docker environment.
The Docker environment will stop automatically, even if it is in the middle of processing tasks, once this amount of time passes from when the Docker environment is started.

7 Project Information

In this system, “project” refers to “research projects.”

7.1 Project List

This displays a list of research projects that the user is involved with.
Research project information is registered by the system administrator.

7.2 Project Details

You can confirm detailed information on research projects selected on the Project List screen.
This displays project information, biobanks available for use on a project, and researchers participating in a project.

8 News & Help

8.1 News

Use this to check important news, and other news.

8.2 Help & Contact

This displays manuals and contact information.
Click each button to view the corresponding manual in a separate tab.

9 Supplement

9.1 Search Target Items in a Keyword Search

Items subject to keyword searches on the Search Metadata screen are summarized below.

Category	Item
Study	Title
	Abstract
	Attributes/Value
	Grants/Title
	Grants/Agency
	Grants/Abbr
	Publications/Reference
	Study Type/Study Type
Experiment	Title
Experiment	Design Description
Sample	Title
	Sample Group Type
	Description
	Disease Name
	icd_code
	Attributes/Value
Analysis	Title
	Description
	Attributes/Value

9.2 Procedure for Using User-Created Docker Image Files

This chapter describes the procedure for creating Docker images. In this section we look at building a Docker image by creating the Docker file that describes the build procedure for the Docker image and the script executed when starting the Docker (entrypoint.sh) in the user’s environment. To create the Docker image described in this Manual, you must have the Docker installed in the user’s environment. Note that the building of a Docker environment in the user’s environment falls outside the scope of this Manual.

While the AMED Data Utilization Platform allows users to upload and run a Docker image, there are restrictions on external connections to and from the executed Docker container.
As such, users cannot directly access a user-created Docker image. The installation of required packages and other processes are set when Creating the Docker Image.

9.2.1 Creating the Docker Image

[1] Create the Dockerfile
The Dockerfile is used to specify the base image, install the required packages, set environment variables, and describes files to add and commands to execute, etc.
The Dockerfile primarily uses the following commands. Take note of the following when creating the Dockerfile.

FROM
Specifies the base image for the Docker image. As such, the Dockerfile must start with the FROM command.
The AMED Data Utilization Platform only runs Linux-based Docker images. Windows-based Docker images cannot be used.

　[Format]
　FROM [:]

USER
Sets the user executing subsequent RUN and ENTRYPOINT commands.
The AMED Data Utilization Platform does not allow the use of root, sys, or adm in the Docker container. As such, root, sys, and adm cannot be specified as the ENTRYPOINT or CMD user when performing analysis.

　[Format]
　USER [:] 　

COPY
Specify the file and directory to add to the Docker image in the copy source to add the file and directory to the copy destination on the Docker image. The copy source must be located in the build context (a set of files and directories used to build the Docker image. In this Manual, this is set to the current directory used to execute the build command).

　[Format]
　COPY

RUN
This describes package installation and configuration commands executed when building the Docker image.

　[Format]
　RUN

The uploaded Docker image automatically executes analysis processes using scripts, etc. in the Docker container and outputs the analysis results to an accessible storage area. Users can retrieve the analysis results file via the AMED Data Utilization Platform application.

Specify install Mountpount for Amazon S3 in the Dockerfile RUN command to allow access to the storage area from the Docker container and upload files. The procedure for installing Mountpoint for Amazon S3 varies depending on the Linux distribution in use. For more information, see the Amazon Simple Storage Service (S3) User Guide.

Example RUN command for installing Mountpoint for Amazon S3 (using Ubuntu)

　# Install mount-s3
　USER root
　RUN apt-get update
　RUN apt-get install -y wget
　RUN wget https://s3.amazonaws.com/mountpoint-s3-release/latest/x86_64/mount-s3.deb　
　RUN apt-get install -y ./mount-s3.deb
　RUN rm -rf ./mount-s3.deb

ENTRYPOINT
Specify the command or script to execute when starting the Docker container.
Only one ENTRYPOINT command can be added to the Dockerfile. To run analysis processes automatically, use the ENTRYPOINT command, or the CMD command. The procedure used to specify an analysis process in the ENTRYPOINT command is described in “(2) Creating a script to be executed when starting the Docker (entrypoint.sh)” in this Manual.

　[Format]
　ENTRYPOINT
　Or

Specify the script in the ENTRYPOINT command (entrypoint.sh)

　ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

CMD
Executes the command specified when the Docker container starts. Note that if ENTRYPOINT is described in the Dockerfile, it will be interpreted as an ENTRYPOINT argument.

　[Format]
　CMD
　Or

The following example shows the entire Dockerfile.

　FROM ubuntu

　# Install mount-s3
　USER root
　RUN apt-get update
　RUN apt-get install -y wget
　RUN wget https://s3.amazonaws.com/mountpoint-s3-release/latest/x86_64/mount-s3.deb　　
　RUN apt-get install -y ./mount-s3.deb
　RUN rm -rf ./mount-s3.deb

　COPY entrypoint.sh /usr/local/bin/entrypoint.sh
　RUN chmod +x /usr/local/bin/entrypoint.sh

Creating a script to be executed when starting the Docker (entrypoint.sh)
In the description provided in this chapter, entrypoint.sh is specified in the ENTRYPOINT command in the Dockerfile, and entrypoint.sh describes the analysis process executed in the Docker container.

・ How to use data files for analysis
In the AMED Data Utilization Platform, the storage area is mounted after the Docker container starts up.
Uploaded data cannot be used or saved until this mounting process is completed.
To prevent the analysis from starting before the mounting process is complete, make sure to set a standby time (60 seconds) before the analysis process begins.

Specify standby time for mounting process (in entrypoint.sh)

　# Waiting for mount process　
　sleep 60s （仮）　

When the mounting process is complete, the data file uploaded from the Data Management screen in the AMED Data Utilization Platform is placed in the following storage location in the Docker container. (※)

Data file storage location

　/home/[User (set when starting the Docker environment)]/s3/[Name of the uploaded data file]　

As updating and saving data files on the mounted user storage area (/home/[User (set when starting the Dockerenvironment)]/s3/) have not been tested, make sure to move data files to the Docker container before use.

Example command for moving data files to the container during use

　cp /home/[User (set when starting the Docker environment)]/s3/[Name of the data file in use] [Container directory]　

You can retrieve and save files output in the analysis process by storing them in the user storage area (/home/User (set when starting the Docker environment)]/s3/). (*)
If the Dockerenvironment is stopped (the Docker container is stopped) without storing output files in the user storage area, the data will be lost and cannot be retrieved when starting the Docker environment up again. Set a command in the analysis process script to store files you wish to save in the user storage area.

Example of command for storing data files

　cp [Container directory]/ [Save file name]　/home/[User (set when starting the Docker environment)]/s3/　

* For more information on the procedure for uploading and downloading data from the AMED Data Utilization Platform, see “6.6 Data Management.

・ Preventing the Docker container from being restarted unintentionally
When running an analysis process, the Docker container will stop when the main process is finished.
When the Docker container stops automatically instead of a stop command issued by the application in the AMED Data Utilization Platform, the Docker container may restart to keep it running.
If this happens, the results of the analysis process performed may be overwritten. To prevent the container from restarting unintentionally, write a command in entrypoint.sh or the Dockerfile to prevent the Docker container process from finishing after the analysis process. Make sure to terminate the Docker container from the application.
An example of a command specified in entrypoint.sh to prevent the Docker container from restarting after the analysis process finishes is provided below.
For more information on the procedure for terminating the Docker container, see “6.7 Docker Environments Environment List”.

・ Example of a command added to entrypoint.sh to prevent the Docker container from restarting

　# End analysis process

　# Execute commands to hold processes　
　tail -F /dev/null　

You can confirm the standard output and error output status for processes executed in the container after the container is started in the container log. View the container log to check the status of container operations. For more information on the procedure for viewing the container log, see “6.7 Docker Environments Environment List”.

9.2.2 Uploading and Downloading Data for Analysis

This chapter describes how to upload and download data for analysis when using an uploaded Docker image.

[1] Uploading data for analysis
Starting the Docker environment using the Docker image uploaded by the user will prevent users from externally connecting to and from the Docker container. As such, users cannot directly access the executed Docker container. When preparing a Docker image in advance, data files to be used and output must be specified in scripts, etc.
Upload data files to use from the Data Management screen.

The data file uploaded from the Data Management screen in the AMED Data Utilization Platform is placed in the following storage location, as viewed from the Docker container.

・Data file storage location

　/home/[User (set when starting the Docker environment)]/s3/[Name of the uploaded data file]

[2] Downloading data for analysis
You can retrieve and save files output in the analysis process by storing them in user storage (/home/User (set when starting the Docker environment)]/s3/). If the Docker environment is stopped without storing output files in user storage, the data will be lost and cannot be retrieved when starting the Docker environment up again. Set commands to store files you want to save in user storage in scripts that run automatically in advance.

・Example of command for storing data files

　cp [Container directory]/ [Save file name]　/home/[User (set when starting the Docker environment)]/s3/

Files saved in the Docker environment will appear in the User Files on the Data Management screen.

9.3 Procedure for Using Template Docker Image Files

Use a template Docker image to display and use the Jupyter Notebook screen in the analysis environment started in 6.8 Setup Docker Environment.
In the Docker Environments screen, click the [Access Notebook] button for the Docker environment connecting to the web browser.

9.3.1 Uploading and Downloading Data for Analysis

This chapter describes how to upload and download data for analysis when using a template image for analysis that comes prepared with the AMED Data Utilization Platform.

[1] Uploading data to use from the Data Management screen

[2] Accessing running Docker environments

[3] Displaying the Jupyter Notebook screen

[4] Confirming upload files

[5] Moving upload files
Data updates and saves are not guaranteed to work in the s3 folder. Make sure to move files to the container for use.

[6] Saving analysis results
To retrieve analysis results output on the container, store the results in the s3 folder to allow the results to be downloaded from the AMED Data Utilization Platform application.
The total size of files that can be saved to the s3 folder is 10 GB. When exceeding 10 GB, new files cannot be stored unless existing files are deleted.

Files saved in the Docker environment will appear in the Data File List on the Data Management screen.

The following restrictions apply to the names of directories and files to be created.

Item name	Restriction
Directory names File names	Only the following characters must be used ・ Half-width alphanumeric characters ［0-9］［a-z］［A-Z］・ Half-width exclamation marks ［!］・ Half-width hyphens ［-］・ Half-width underscores ［_］・ Half-width periods ［.］　(Half-width periods cannot be used at the start of the file name) ・ Half-width at signs ［@］・ Half-width equal signs ［=］・ Half-width commas ［,］・ Half-width carets ［^］・ Half-width sharp marks ［#］・ Half-width round parentheses ［()］・ Half-width curly brackets ［{}］・ Half-width square brackets ［[]］

Item name

Restriction

Directory names
File names

Only the following characters must be used
・ Half-width alphanumeric characters ［0-9］［a-z］［A-Z］
・ Half-width exclamation marks ［!］
・ Half-width hyphens ［-］
・ Half-width underscores ［_］
・ Half-width periods ［.］　(Half-width periods cannot be used at the start of the file name)
・ Half-width at signs ［@］
・ Half-width equal signs ［=］
・ Half-width commas ［,］
・ Half-width carets ［^］
・ Half-width sharp marks ［#］
・ Half-width round parentheses ［()］
・ Half-width curly brackets ［{}］
・ Half-width square brackets ［[]］

9.3.2 Stopping a Docker Environment

[1] Confirm running Docker environments
From the top menu of the AMED Data Utilization Platform, select [Pre-Research], and then [Docker Environments].

[2] Stopping a Docker Environment
Select [STOP] for the running Docker environment that is no longer in use.
Stopping the container will delete all data for analysis that has not been stored. If you have data that you need to save, see 9.3.1 Uploading and Downloading Data for Analysis.

10 FAQ

No	Question	Answer
1	What if I forget my passcode number?	Please reset the passcode number, and then register a new passcode number. Ask your system administrator to reset your passcode number.
2	How do I change the email address that one-time passwords are sent to?	As this requires updates to the user information, ask your system administrator to change the email address.
3	What do I do if the supercomputer connection link does not appear on the screen?	Only connections through collaborative computer systems allow users to connect to the supercomputer. Check whether you are using the URL for connecting via the collaborative computer systems.
4	What do I do if old Search Condition histories do not appear on the Search Metadata screen?	The Search Condition History only shows the five most recent history entries. Older search histories will not appear, so you will need to re-enter the Search Conditions to perform the search again.
5	What if I participate in an increased number of research projects?	As this requires updates to the user information, ask your system administrator to add research projects.