M7350v1_en_gpl

This commit is contained in:
T
2024-09-09 08:52:07 +00:00
commit f9cc65cfda
65988 changed files with 26357421 additions and 0 deletions

View File

@ -0,0 +1,26 @@
00-INDEX
- this file.
3270.ChangeLog
- ChangeLog for the UTS Global 3270-support patch (outdated).
3270.txt
- how to use the IBM 3270 display system support.
cds.txt
- s390 common device support (common I/O layer).
CommonIO
- common I/O layer command line parameters, procfs and debugfs entries
config3270.sh
- example configuration for 3270 devices.
DASD
- information on the DASD disk device driver.
Debugging390.txt
- hints for debugging on s390 systems.
driver-model.txt
- information on s390 devices and the driver model.
monreader.txt
- information on accessing the z/VM monitor stream from Linux.
s390dbf.txt
- information on using the s390 debug feature.
TAPE
- information on the driver for channel-attached tapes.
zfcpdump
- information on the s390 SCSI dump tool.

View File

@ -0,0 +1,44 @@
ChangeLog for the UTS Global 3270-support patch
Sep 2002: Get bootup colors right on 3270 console
* In tubttybld.c, substantially revise ESC processing so that
ESC sequences (especially coloring ones) and the strings
they affect work as right as 3270 can get them. Also, set
screen height to omit the two rows used for input area, in
tty3270_open() in tubtty.c.
Sep 2002: Dynamically get 3270 input buffer
* Oversize 3270 screen widths may exceed GEOM_MAXINPLEN columns,
so get input-area buffer dynamically when sizing the device in
tubmakemin() in tuball.c (if it's the console) or tty3270_open()
in tubtty.c (if needed). Change tubp->tty_input to be a
pointer rather than an array, in tubio.h.
Sep 2002: Fix tubfs kmalloc()s
* Do read and write lengths correctly in fs3270_read()
and fs3270_write(), whilst never asking kmalloc()
for more than 0x800 bytes. Affects tubfs.c and tubio.h.
Sep 2002: Recognize 3270 control unit type 3174
* Recognize control-unit type 0x3174 as well as 0x327?.
The IBM 2047 device emulates a 3174 control unit.
Modularize control-unit recognition in tuball.c by
adding and invoking new tub3270_is_ours().
Apr 2002: Fix 3270 console reboot loop
* (Belated log entry) Fixed reboot loop if 3270 console,
in tubtty.c:ttu3270_bh().
Feb 6, 2001:
* This changelog is new
* tub3270 now supports 3270 console:
Specify y for CONFIG_3270 and y for CONFIG_3270_CONSOLE.
Support for 3215 will not appear if 3270 console support
is chosen.
NOTE: The default is 3270 console support, NOT 3215.
* the components are remodularized: added source modules are
tubttybld.c and tubttyscl.c, for screen-building code and
scroll-timeout code.
* tub3270 source for this (2.4.0) version is #ifdeffed to
build with both 2.4.0 and 2.2.16.2.
* color support and minimal other ESC-sequence support is added.

View File

@ -0,0 +1,271 @@
IBM 3270 Display System support
This file describes the driver that supports local channel attachment
of IBM 3270 devices. It consists of three sections:
* Introduction
* Installation
* Operation
INTRODUCTION.
This paper describes installing and operating 3270 devices under
Linux/390. A 3270 device is a block-mode rows-and-columns terminal of
which I'm sure hundreds of millions were sold by IBM and clonemakers
twenty and thirty years ago.
You may have 3270s in-house and not know it. If you're using the
VM-ESA operating system, define a 3270 to your virtual machine by using
the command "DEF GRAF <hex-address>" This paper presumes you will be
defining four 3270s with the CP/CMS commands
DEF GRAF 620
DEF GRAF 621
DEF GRAF 622
DEF GRAF 623
Your network connection from VM-ESA allows you to use x3270, tn3270, or
another 3270 emulator, started from an xterm window on your PC or
workstation. With the DEF GRAF command, an application such as xterm,
and this Linux-390 3270 driver, you have another way of talking to your
Linux box.
This paper covers installation of the driver and operation of a
dialed-in x3270.
INSTALLATION.
You install the driver by installing a patch, doing a kernel build, and
running the configuration script (config3270.sh, in this directory).
WARNING: If you are using 3270 console support, you must rerun the
configuration script every time you change the console's address (perhaps
by using the condev= parameter in silo's /boot/parmfile). More precisely,
you should rerun the configuration script every time your set of 3270s,
including the console 3270, changes subchannel identifier relative to
one another. ReIPL as soon as possible after running the configuration
script and the resulting /tmp/mkdev3270.
If you have chosen to make tub3270 a module, you add a line to a
configuration file under /etc/modprobe.d/. If you are working on a VM
virtual machine, you can use DEF GRAF to define virtual 3270 devices.
You may generate both 3270 and 3215 console support, or one or the
other, or neither. If you generate both, the console type under VM is
not changed. Use #CP Q TERM to see what the current console type is.
Use #CP TERM CONMODE 3270 to change it to 3270. If you generate only
3270 console support, then the driver automatically converts your console
at boot time to a 3270 if it is a 3215.
In brief, these are the steps:
1. Install the tub3270 patch
2. (If a module) add a line to a file in /etc/modprobe.d/*.conf
3. (If VM) define devices with DEF GRAF
4. Reboot
5. Configure
To test that everything works, assuming VM and x3270,
1. Bring up an x3270 window.
2. Use the DIAL command in that window.
3. You should immediately see a Linux login screen.
Here are the installation steps in detail:
1. The 3270 driver is a part of the official Linux kernel
source. Build a tree with the kernel source and any necessary
patches. Then do
make oldconfig
(If you wish to disable 3215 console support, edit
.config; change CONFIG_TN3215's value to "n";
and rerun "make oldconfig".)
make image
make modules
make modules_install
2. (Perform this step only if you have configured tub3270 as a
module.) Add a line to a file /etc/modprobe.d/*.conf to automatically
load the driver when it's needed. With this line added, you will see
login prompts appear on your 3270s as soon as boot is complete (or
with emulated 3270s, as soon as you dial into your vm guest using the
command "DIAL <vmguestname>"). Since the line-mode major number is
227, the line to add should be:
alias char-major-227 tub3270
3. Define graphic devices to your vm guest machine, if you
haven't already. Define them before you reboot (reipl):
DEFINE GRAF 620
DEFINE GRAF 621
DEFINE GRAF 622
DEFINE GRAF 623
4. Reboot. The reboot process scans hardware devices, including
3270s, and this enables the tub3270 driver once loaded to respond
correctly to the configuration requests of the next step. If
you have chosen 3270 console support, your console now behaves
as a 3270, not a 3215.
5. Run the 3270 configuration script config3270. It is
distributed in this same directory, Documentation/s390, as
config3270.sh. Inspect the output script it produces,
/tmp/mkdev3270, and then run that script. This will create the
necessary character special device files and make the necessary
changes to /etc/inittab.
Then notify /sbin/init that /etc/inittab has changed, by issuing
the telinit command with the q operand:
cd Documentation/s390
sh config3270.sh
sh /tmp/mkdev3270
telinit q
This should be sufficient for your first time. If your 3270
configuration has changed and you're reusing config3270, you
should follow these steps:
Change 3270 configuration
Reboot
Run config3270 and /tmp/mkdev3270
Reboot
Here are the testing steps in detail:
1. Bring up an x3270 window, or use an actual hardware 3278 or
3279, or use the 3270 emulator of your choice. You would be
running the emulator on your PC or workstation. You would use
the command, for example,
x3270 vm-esa-domain-name &
if you wanted a 3278 Model 4 with 43 rows of 80 columns, the
default model number. The driver does not take advantage of
extended attributes.
The screen you should now see contains a VM logo with input
lines near the bottom. Use TAB to move to the bottom line,
probably labeled "COMMAND ===>".
2. Use the DIAL command instead of the LOGIN command to connect
to one of the virtual 3270s you defined with the DEF GRAF
commands:
dial my-vm-guest-name
3. You should immediately see a login prompt from your
Linux-390 operating system. If that does not happen, you would
see instead the line "DIALED TO my-vm-guest-name 0620".
To troubleshoot: do these things.
A. Is the driver loaded? Use the lsmod command (no operands)
to find out. Probably it isn't. Try loading it manually, with
the command "insmod tub3270". Does that command give error
messages? Ha! There's your problem.
B. Is the /etc/inittab file modified as in installation step 3
above? Use the grep command to find out; for instance, issue
"grep 3270 /etc/inittab". Nothing found? There's your
problem!
C. Are the device special files created, as in installation
step 2 above? Use the ls -l command to find out; for instance,
issue "ls -l /dev/3270/tty620". The output should start with the
letter "c" meaning character device and should contain "227, 1"
just to the left of the device name. No such file? no "c"?
Wrong major number? Wrong minor number? There's your
problem!
D. Do you get the message
"HCPDIA047E my-vm-guest-name 0620 does not exist"?
If so, you must issue the command "DEF GRAF 620" from your VM
3215 console and then reboot the system.
OPERATION.
The driver defines three areas on the 3270 screen: the log area, the
input area, and the status area.
The log area takes up all but the bottom two lines of the screen. The
driver writes terminal output to it, starting at the top line and going
down. When it fills, the status area changes from "Linux Running" to
"Linux More...". After a scrolling timeout of (default) 5 sec, the
screen clears and more output is written, from the top down.
The input area extends from the beginning of the second-to-last screen
line to the start of the status area. You type commands in this area
and hit ENTER to execute them.
The status area initializes to "Linux Running" to give you a warm
fuzzy feeling. When the log area fills up and output awaits, it
changes to "Linux More...". At this time you can do several things or
nothing. If you do nothing, the screen will clear in (default) 5 sec
and more output will appear. You may hit ENTER with nothing typed in
the input area to toggle between "Linux More..." and "Linux Holding",
which indicates no scrolling will occur. (If you hit ENTER with "Linux
Running" and nothing typed, the application receives a newline.)
You may change the scrolling timeout value. For example, the following
command line:
echo scrolltime=60 > /proc/tty/driver/tty3270
changes the scrolling timeout value to 60 sec. Set scrolltime to 0 if
you wish to prevent scrolling entirely.
Other things you may do when the log area fills up are: hit PA2 to
clear the log area and write more output to it, or hit CLEAR to clear
the log area and the input area and write more output to the log area.
Some of the Program Function (PF) and Program Attention (PA) keys are
preassigned special functions. The ones that are not yield an alarm
when pressed.
PA1 causes a SIGINT to the currently running application. You may do
the same thing from the input area, by typing "^C" and hitting ENTER.
PA2 causes the log area to be cleared. If output awaits, it is then
written to the log area.
PF3 causes an EOF to be received as input by the application. You may
cause an EOF also by typing "^D" and hitting ENTER.
No PF key is preassigned to cause a job suspension, but you may cause a
job suspension by typing "^Z" and hitting ENTER. You may wish to
assign this function to a PF key. To make PF7 cause job suspension,
execute the command:
echo pf7=^z > /proc/tty/driver/tty3270
If the input you type does not end with the two characters "^n", the
driver appends a newline character and sends it to the tty driver;
otherwise the driver strips the "^n" and does not append a newline.
The IBM 3215 driver behaves similarly.
Pf10 causes the most recent command to be retrieved from the tube's
command stack (default depth 20) and displayed in the input area. You
may hit PF10 again for the next-most-recent command, and so on. A
command is entered into the stack only when the input area is not made
invisible (such as for password entry) and it is not identical to the
current top entry. PF10 rotates backward through the command stack;
PF11 rotates forward. You may assign the backward function to any PF
key (or PA key, for that matter), say, PA3, with the command:
echo -e pa3=\\033k > /proc/tty/driver/tty3270
This assigns the string ESC-k to PA3. Similarly, the string ESC-j
performs the forward function. (Rationale: In bash with vi-mode line
editing, ESC-k and ESC-j retrieve backward and forward history.
Suggestions welcome.)
Is a stack size of twenty commands not to your liking? Change it on
the fly. To change to saving the last 100 commands, execute the
command:
echo recallsize=100 > /proc/tty/driver/tty3270
Have a command you issue frequently? Assign it to a PF or PA key! Use
the command
echo pf24="mkdir foobar; cd foobar" > /proc/tty/driver/tty3270
to execute the commands mkdir foobar and cd foobar immediately when you
hit PF24. Want to see the command line first, before you execute it?
Use the -n option of the echo command:
echo -n pf24="mkdir foo; cd foo" > /proc/tty/driver/tty3270
Happy testing! I welcome any and all comments about this document, the
driver, etc etc.
Dick Hitt <rbh00@utsglobal.com>

View File

@ -0,0 +1,123 @@
S/390 common I/O-Layer - command line parameters, procfs and debugfs entries
============================================================================
Command line parameters
-----------------------
* ccw_timeout_log
Enable logging of debug information in case of ccw device timeouts.
* cio_ignore = {all} |
{<device> | <range of devices>} |
{!<device> | !<range of devices>}
The given devices will be ignored by the common I/O-layer; no detection
and device sensing will be done on any of those devices. The subchannel to
which the device in question is attached will be treated as if no device was
attached.
An ignored device can be un-ignored later; see the "/proc entries"-section for
details.
The devices must be given either as bus ids (0.x.abcd) or as hexadecimal
device numbers (0xabcd or abcd, for 2.4 backward compatibility). If you
give a device number 0xabcd, it will be interpreted as 0.0.abcd.
You can use the 'all' keyword to ignore all devices.
The '!' operator will cause the I/O-layer to _not_ ignore a device.
The command line is parsed from left to right.
For example,
cio_ignore=0.0.0023-0.0.0042,0.0.4711
will ignore all devices ranging from 0.0.0023 to 0.0.0042 and the device
0.0.4711, if detected.
As another example,
cio_ignore=all,!0.0.4711,!0.0.fd00-0.0.fd02
will ignore all devices but 0.0.4711, 0.0.fd00, 0.0.fd01, 0.0.fd02.
By default, no devices are ignored.
/proc entries
-------------
* /proc/cio_ignore
Lists the ranges of devices (by bus id) which are ignored by common I/O.
You can un-ignore certain or all devices by piping to /proc/cio_ignore.
"free all" will un-ignore all ignored devices,
"free <device range>, <device range>, ..." will un-ignore the specified
devices.
For example, if devices 0.0.0023 to 0.0.0042 and 0.0.4711 are ignored,
- echo free 0.0.0030-0.0.0032 > /proc/cio_ignore
will un-ignore devices 0.0.0030 to 0.0.0032 and will leave devices 0.0.0023
to 0.0.002f, 0.0.0033 to 0.0.0042 and 0.0.4711 ignored;
- echo free 0.0.0041 > /proc/cio_ignore will furthermore un-ignore device
0.0.0041;
- echo free all > /proc/cio_ignore will un-ignore all remaining ignored
devices.
When a device is un-ignored, device recognition and sensing is performed and
the device driver will be notified if possible, so the device will become
available to the system. Note that un-ignoring is performed asynchronously.
You can also add ranges of devices to be ignored by piping to
/proc/cio_ignore; "add <device range>, <device range>, ..." will ignore the
specified devices.
Note: While already known devices can be added to the list of devices to be
ignored, there will be no effect on then. However, if such a device
disappears and then reappears, it will then be ignored. To make
known devices go away, you need the "purge" command (see below).
For example,
"echo add 0.0.a000-0.0.accc, 0.0.af00-0.0.afff > /proc/cio_ignore"
will add 0.0.a000-0.0.accc and 0.0.af00-0.0.afff to the list of ignored
devices.
You can remove already known but now ignored devices via
"echo purge > /proc/cio_ignore"
All devices ignored but still registered and not online (= not in use)
will be deregistered and thus removed from the system.
The devices can be specified either by bus id (0.x.abcd) or, for 2.4 backward
compatibility, by the device number in hexadecimal (0xabcd or abcd). Device
numbers given as 0xabcd will be interpreted as 0.0.abcd.
* /proc/cio_settle
A write request to this file is blocked until all queued cio actions are
handled. This will allow userspace to wait for pending work affecting
device availability after changing cio_ignore or the hardware configuration.
* For some of the information present in the /proc filesystem in 2.4 (namely,
/proc/subchannels and /proc/chpids), see driver-model.txt.
Information formerly in /proc/irq_count is now in /proc/interrupts.
debugfs entries
---------------
* /sys/kernel/debug/s390dbf/cio_*/ (S/390 debug feature)
Some views generated by the debug feature to hold various debug outputs.
- /sys/kernel/debug/s390dbf/cio_crw/sprintf
Messages from the processing of pending channel report words (machine check
handling).
- /sys/kernel/debug/s390dbf/cio_msg/sprintf
Various debug messages from the common I/O-layer.
- /sys/kernel/debug/s390dbf/cio_trace/hex_ascii
Logs the calling of functions in the common I/O-layer and, if applicable,
which subchannel they were called for, as well as dumps of some data
structures (like irb in an error case).
The level of logging can be changed to be more or less verbose by piping to
/sys/kernel/debug/s390dbf/cio_*/level a number between 0 and 6; see the
documentation on the S/390 debug feature (Documentation/s390/s390dbf.txt)
for details.

View File

@ -0,0 +1,73 @@
DASD device driver
S/390's disk devices (DASDs) are managed by Linux via the DASD device
driver. It is valid for all types of DASDs and represents them to
Linux as block devices, namely "dd". Currently the DASD driver uses a
single major number (254) and 4 minor numbers per volume (1 for the
physical volume and 3 for partitions). With respect to partitions see
below. Thus you may have up to 64 DASD devices in your system.
The kernel parameter 'dasd=from-to,...' may be issued arbitrary times
in the kernel's parameter line or not at all. The 'from' and 'to'
parameters are to be given in hexadecimal notation without a leading
0x.
If you supply kernel parameters the different instances are processed
in order of appearance and a minor number is reserved for any device
covered by the supplied range up to 64 volumes. Additional DASDs are
ignored. If you do not supply the 'dasd=' kernel parameter at all, the
DASD driver registers all supported DASDs of your system to a minor
number in ascending order of the subchannel number.
The driver currently supports ECKD-devices and there are stubs for
support of the FBA and CKD architectures. For the FBA architecture
only some smart data structures are missing to make the support
complete.
We performed our testing on 3380 and 3390 type disks of different
sizes, under VM and on the bare hardware (LPAR), using internal disks
of the multiprise as well as a RAMAC virtual array. Disks exported by
an Enterprise Storage Server (Seascape) should work fine as well.
We currently implement one partition per volume, which is the whole
volume, skipping the first blocks up to the volume label. These are
reserved for IPL records and IBM's volume label to assure
accessibility of the DASD from other OSs. In a later stage we will
provide support of partitions, maybe VTOC oriented or using a kind of
partition table in the label record.
USAGE
-Low-level format (?CKD only)
For using an ECKD-DASD as a Linux harddisk you have to low-level
format the tracks by issuing the BLKDASDFORMAT-ioctl on that
device. This will erase any data on that volume including IBM volume
labels, VTOCs etc. The ioctl may take a 'struct format_data *' or
'NULL' as an argument.
typedef struct {
int start_unit;
int stop_unit;
int blksize;
} format_data_t;
When a NULL argument is passed to the BLKDASDFORMAT ioctl the whole
disk is formatted to a blocksize of 1024 bytes. Otherwise start_unit
and stop_unit are the first and last track to be formatted. If
stop_unit is -1 it implies that the DASD is formatted from start_unit
up to the last track. blksize can be any power of two between 512 and
4096. We recommend no blksize lower than 1024 because the ext2fs uses
1kB blocks anyway and you gain approx. 50% of capacity increasing your
blksize from 512 byte to 1kB.
-Make a filesystem
Then you can mk??fs the filesystem of your choice on that volume or
partition. For reasons of sanity you should build your filesystem on
the partition /dev/dd?1 instead of the whole volume. You only lose 3kB
but may be sure that you can reuse your data after introduction of a
real partition table.
BUGS:
- Performance sometimes is rather low because we don't fully exploit clustering
TODO-List:
- Add IBM'S Disk layout to genhd
- Enhance driver to use more than one major number
- Enable usage as a module
- Support Cache fast write and DASD fast write (ECKD)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,472 @@
Linux for S/390 and zSeries
Common Device Support (CDS)
Device Driver I/O Support Routines
Authors : Ingo Adlung
Cornelia Huck
Copyright, IBM Corp. 1999-2002
Introduction
This document describes the common device support routines for Linux/390.
Different than other hardware architectures, ESA/390 has defined a unified
I/O access method. This gives relief to the device drivers as they don't
have to deal with different bus types, polling versus interrupt
processing, shared versus non-shared interrupt processing, DMA versus port
I/O (PIO), and other hardware features more. However, this implies that
either every single device driver needs to implement the hardware I/O
attachment functionality itself, or the operating system provides for a
unified method to access the hardware, providing all the functionality that
every single device driver would have to provide itself.
The document does not intend to explain the ESA/390 hardware architecture in
every detail.This information can be obtained from the ESA/390 Principles of
Operation manual (IBM Form. No. SA22-7201).
In order to build common device support for ESA/390 I/O interfaces, a
functional layer was introduced that provides generic I/O access methods to
the hardware.
The common device support layer comprises the I/O support routines defined
below. Some of them implement common Linux device driver interfaces, while
some of them are ESA/390 platform specific.
Note:
In order to write a driver for S/390, you also need to look into the interface
described in Documentation/s390/driver-model.txt.
Note for porting drivers from 2.4:
The major changes are:
* The functions use a ccw_device instead of an irq (subchannel).
* All drivers must define a ccw_driver (see driver-model.txt) and the associated
functions.
* request_irq() and free_irq() are no longer done by the driver.
* The oper_handler is (kindof) replaced by the probe() and set_online() functions
of the ccw_driver.
* The not_oper_handler is (kindof) replaced by the remove() and set_offline()
functions of the ccw_driver.
* The channel device layer is gone.
* The interrupt handlers must be adapted to use a ccw_device as argument.
Moreover, they don't return a devstat, but an irb.
* Before initiating an io, the options must be set via ccw_device_set_options().
* Instead of calling read_dev_chars()/read_conf_data(), the driver issues
the channel program and handles the interrupt itself.
ccw_device_get_ciw()
get commands from extended sense data.
ccw_device_start()
ccw_device_start_timeout()
ccw_device_start_key()
ccw_device_start_key_timeout()
initiate an I/O request.
ccw_device_resume()
resume channel program execution.
ccw_device_halt()
terminate the current I/O request processed on the device.
do_IRQ()
generic interrupt routine. This function is called by the interrupt entry
routine whenever an I/O interrupt is presented to the system. The do_IRQ()
routine determines the interrupt status and calls the device specific
interrupt handler according to the rules (flags) defined during I/O request
initiation with do_IO().
The next chapters describe the functions other than do_IRQ() in more details.
The do_IRQ() interface is not described, as it is called from the Linux/390
first level interrupt handler only and does not comprise a device driver
callable interface. Instead, the functional description of do_IO() also
describes the input to the device specific interrupt handler.
Note: All explanations apply also to the 64 bit architecture s390x.
Common Device Support (CDS) for Linux/390 Device Drivers
General Information
The following chapters describe the I/O related interface routines the
Linux/390 common device support (CDS) provides to allow for device specific
driver implementations on the IBM ESA/390 hardware platform. Those interfaces
intend to provide the functionality required by every device driver
implementation to allow to drive a specific hardware device on the ESA/390
platform. Some of the interface routines are specific to Linux/390 and some
of them can be found on other Linux platforms implementations too.
Miscellaneous function prototypes, data declarations, and macro definitions
can be found in the architecture specific C header file
linux/arch/s390/include/asm/irq.h.
Overview of CDS interface concepts
Different to other hardware platforms, the ESA/390 architecture doesn't define
interrupt lines managed by a specific interrupt controller and bus systems
that may or may not allow for shared interrupts, DMA processing, etc.. Instead,
the ESA/390 architecture has implemented a so called channel subsystem, that
provides a unified view of the devices physically attached to the systems.
Though the ESA/390 hardware platform knows about a huge variety of different
peripheral attachments like disk devices (aka. DASDs), tapes, communication
controllers, etc. they can all be accessed by a well defined access method and
they are presenting I/O completion a unified way : I/O interruptions. Every
single device is uniquely identified to the system by a so called subchannel,
where the ESA/390 architecture allows for 64k devices be attached.
Linux, however, was first built on the Intel PC architecture, with its two
cascaded 8259 programmable interrupt controllers (PICs), that allow for a
maximum of 15 different interrupt lines. All devices attached to such a system
share those 15 interrupt levels. Devices attached to the ISA bus system must
not share interrupt levels (aka. IRQs), as the ISA bus bases on edge triggered
interrupts. MCA, EISA, PCI and other bus systems base on level triggered
interrupts, and therewith allow for shared IRQs. However, if multiple devices
present their hardware status by the same (shared) IRQ, the operating system
has to call every single device driver registered on this IRQ in order to
determine the device driver owning the device that raised the interrupt.
Up to kernel 2.4, Linux/390 used to provide interfaces via the IRQ (subchannel).
For internal use of the common I/O layer, these are still there. However,
device drivers should use the new calling interface via the ccw_device only.
During its startup the Linux/390 system checks for peripheral devices. Each
of those devices is uniquely defined by a so called subchannel by the ESA/390
channel subsystem. While the subchannel numbers are system generated, each
subchannel also takes a user defined attribute, the so called device number.
Both subchannel number and device number cannot exceed 65535. During sysfs
initialisation, the information about control unit type and device types that
imply specific I/O commands (channel command words - CCWs) in order to operate
the device are gathered. Device drivers can retrieve this set of hardware
information during their initialization step to recognize the devices they
support using the information saved in the struct ccw_device given to them.
This methods implies that Linux/390 doesn't require to probe for free (not
armed) interrupt request lines (IRQs) to drive its devices with. Where
applicable, the device drivers can use issue the READ DEVICE CHARACTERISTICS
ccw to retrieve device characteristics in its online routine.
In order to allow for easy I/O initiation the CDS layer provides a
ccw_device_start() interface that takes a device specific channel program (one
or more CCWs) as input sets up the required architecture specific control blocks
and initiates an I/O request on behalf of the device driver. The
ccw_device_start() routine allows to specify whether it expects the CDS layer
to notify the device driver for every interrupt it observes, or with final status
only. See ccw_device_start() for more details. A device driver must never issue
ESA/390 I/O commands itself, but must use the Linux/390 CDS interfaces instead.
For long running I/O request to be canceled, the CDS layer provides the
ccw_device_halt() function. Some devices require to initially issue a HALT
SUBCHANNEL (HSCH) command without having pending I/O requests. This function is
also covered by ccw_device_halt().
get_ciw() - get command information word
This call enables a device driver to get information about supported commands
from the extended SenseID data.
struct ciw *
ccw_device_get_ciw(struct ccw_device *cdev, __u32 cmd);
cdev - The ccw_device for which the command is to be retrieved.
cmd - The command type to be retrieved.
ccw_device_get_ciw() returns:
NULL - No extended data available, invalid device or command not found.
!NULL - The command requested.
ccw_device_start() - Initiate I/O Request
The ccw_device_start() routines is the I/O request front-end processor. All
device driver I/O requests must be issued using this routine. A device driver
must not issue ESA/390 I/O commands itself. Instead the ccw_device_start()
routine provides all interfaces required to drive arbitrary devices.
This description also covers the status information passed to the device
driver's interrupt handler as this is related to the rules (flags) defined
with the associated I/O request when calling ccw_device_start().
int ccw_device_start(struct ccw_device *cdev,
struct ccw1 *cpa,
unsigned long intparm,
__u8 lpm,
unsigned long flags);
int ccw_device_start_timeout(struct ccw_device *cdev,
struct ccw1 *cpa,
unsigned long intparm,
__u8 lpm,
unsigned long flags,
int expires);
int ccw_device_start_key(struct ccw_device *cdev,
struct ccw1 *cpa,
unsigned long intparm,
__u8 lpm,
__u8 key,
unsigned long flags);
int ccw_device_start_key_timeout(struct ccw_device *cdev,
struct ccw1 *cpa,
unsigned long intparm,
__u8 lpm,
__u8 key,
unsigned long flags,
int expires);
cdev : ccw_device the I/O is destined for
cpa : logical start address of channel program
user_intparm : user specific interrupt information; will be presented
back to the device driver's interrupt handler. Allows a
device driver to associate the interrupt with a
particular I/O request.
lpm : defines the channel path to be used for a specific I/O
request. A value of 0 will make cio use the opm.
key : the storage key to use for the I/O (useful for operating on a
storage with a storage key != default key)
flag : defines the action to be performed for I/O processing
expires : timeout value in jiffies. The common I/O layer will terminate
the running program after this and call the interrupt handler
with ERR_PTR(-ETIMEDOUT) as irb.
Possible flag values are :
DOIO_ALLOW_SUSPEND - channel program may become suspended
DOIO_DENY_PREFETCH - don't allow for CCW prefetch; usually
this implies the channel program might
become modified
DOIO_SUPPRESS_INTER - don't call the handler on intermediate status
The cpa parameter points to the first format 1 CCW of a channel program :
struct ccw1 {
__u8 cmd_code;/* command code */
__u8 flags; /* flags, like IDA addressing, etc. */
__u16 count; /* byte count */
__u32 cda; /* data address */
} __attribute__ ((packed,aligned(8)));
with the following CCW flags values defined :
CCW_FLAG_DC - data chaining
CCW_FLAG_CC - command chaining
CCW_FLAG_SLI - suppress incorrect length
CCW_FLAG_SKIP - skip
CCW_FLAG_PCI - PCI
CCW_FLAG_IDA - indirect addressing
CCW_FLAG_SUSPEND - suspend
Via ccw_device_set_options(), the device driver may specify the following
options for the device:
DOIO_EARLY_NOTIFICATION - allow for early interrupt notification
DOIO_REPORT_ALL - report all interrupt conditions
The ccw_device_start() function returns :
0 - successful completion or request successfully initiated
-EBUSY - The device is currently processing a previous I/O request, or there is
a status pending at the device.
-ENODEV - cdev is invalid, the device is not operational or the ccw_device is
not online.
When the I/O request completes, the CDS first level interrupt handler will
accumulate the status in a struct irb and then call the device interrupt handler.
The intparm field will contain the value the device driver has associated with a
particular I/O request. If a pending device status was recognized,
intparm will be set to 0 (zero). This may happen during I/O initiation or delayed
by an alert status notification. In any case this status is not related to the
current (last) I/O request. In case of a delayed status notification no special
interrupt will be presented to indicate I/O completion as the I/O request was
never started, even though ccw_device_start() returned with successful completion.
The irb may contain an error value, and the device driver should check for this
first:
-ETIMEDOUT: the common I/O layer terminated the request after the specified
timeout value
-EIO: the common I/O layer terminated the request due to an error state
If the concurrent sense flag in the extended status word (esw) in the irb is
set, the field erw.scnt in the esw describes the number of device specific
sense bytes available in the extended control word irb->scsw.ecw[]. No device
sensing by the device driver itself is required.
The device interrupt handler can use the following definitions to investigate
the primary unit check source coded in sense byte 0 :
SNS0_CMD_REJECT 0x80
SNS0_INTERVENTION_REQ 0x40
SNS0_BUS_OUT_CHECK 0x20
SNS0_EQUIPMENT_CHECK 0x10
SNS0_DATA_CHECK 0x08
SNS0_OVERRUN 0x04
SNS0_INCOMPL_DOMAIN 0x01
Depending on the device status, multiple of those values may be set together.
Please refer to the device specific documentation for details.
The irb->scsw.cstat field provides the (accumulated) subchannel status :
SCHN_STAT_PCI - program controlled interrupt
SCHN_STAT_INCORR_LEN - incorrect length
SCHN_STAT_PROG_CHECK - program check
SCHN_STAT_PROT_CHECK - protection check
SCHN_STAT_CHN_DATA_CHK - channel data check
SCHN_STAT_CHN_CTRL_CHK - channel control check
SCHN_STAT_INTF_CTRL_CHK - interface control check
SCHN_STAT_CHAIN_CHECK - chaining check
The irb->scsw.dstat field provides the (accumulated) device status :
DEV_STAT_ATTENTION - attention
DEV_STAT_STAT_MOD - status modifier
DEV_STAT_CU_END - control unit end
DEV_STAT_BUSY - busy
DEV_STAT_CHN_END - channel end
DEV_STAT_DEV_END - device end
DEV_STAT_UNIT_CHECK - unit check
DEV_STAT_UNIT_EXCEP - unit exception
Please see the ESA/390 Principles of Operation manual for details on the
individual flag meanings.
Usage Notes :
ccw_device_start() must be called disabled and with the ccw device lock held.
The device driver is allowed to issue the next ccw_device_start() call from
within its interrupt handler already. It is not required to schedule a
bottom-half, unless a non deterministically long running error recovery procedure
or similar needs to be scheduled. During I/O processing the Linux/390 generic
I/O device driver support has already obtained the IRQ lock, i.e. the handler
must not try to obtain it again when calling ccw_device_start() or we end in a
deadlock situation!
If a device driver relies on an I/O request to be completed prior to start the
next it can reduce I/O processing overhead by chaining a NoOp I/O command
CCW_CMD_NOOP to the end of the submitted CCW chain. This will force Channel-End
and Device-End status to be presented together, with a single interrupt.
However, this should be used with care as it implies the channel will remain
busy, not being able to process I/O requests for other devices on the same
channel. Therefore e.g. read commands should never use this technique, as the
result will be presented by a single interrupt anyway.
In order to minimize I/O overhead, a device driver should use the
DOIO_REPORT_ALL only if the device can report intermediate interrupt
information prior to device-end the device driver urgently relies on. In this
case all I/O interruptions are presented to the device driver until final
status is recognized.
If a device is able to recover from asynchronously presented I/O errors, it can
perform overlapping I/O using the DOIO_EARLY_NOTIFICATION flag. While some
devices always report channel-end and device-end together, with a single
interrupt, others present primary status (channel-end) when the channel is
ready for the next I/O request and secondary status (device-end) when the data
transmission has been completed at the device.
Above flag allows to exploit this feature, e.g. for communication devices that
can handle lost data on the network to allow for enhanced I/O processing.
Unless the channel subsystem at any time presents a secondary status interrupt,
exploiting this feature will cause only primary status interrupts to be
presented to the device driver while overlapping I/O is performed. When a
secondary status without error (alert status) is presented, this indicates
successful completion for all overlapping ccw_device_start() requests that have
been issued since the last secondary (final) status.
Channel programs that intend to set the suspend flag on a channel command word
(CCW) must start the I/O operation with the DOIO_ALLOW_SUSPEND option or the
suspend flag will cause a channel program check. At the time the channel program
becomes suspended an intermediate interrupt will be generated by the channel
subsystem.
ccw_device_resume() - Resume Channel Program Execution
If a device driver chooses to suspend the current channel program execution by
setting the CCW suspend flag on a particular CCW, the channel program execution
is suspended. In order to resume channel program execution the CIO layer
provides the ccw_device_resume() routine.
int ccw_device_resume(struct ccw_device *cdev);
cdev - ccw_device the resume operation is requested for
The ccw_device_resume() function returns:
0 - suspended channel program is resumed
-EBUSY - status pending
-ENODEV - cdev invalid or not-operational subchannel
-EINVAL - resume function not applicable
-ENOTCONN - there is no I/O request pending for completion
Usage Notes:
Please have a look at the ccw_device_start() usage notes for more details on
suspended channel programs.
ccw_device_halt() - Halt I/O Request Processing
Sometimes a device driver might need a possibility to stop the processing of
a long-running channel program or the device might require to initially issue
a halt subchannel (HSCH) I/O command. For those purposes the ccw_device_halt()
command is provided.
ccw_device_halt() must be called disabled and with the ccw device lock held.
int ccw_device_halt(struct ccw_device *cdev,
unsigned long intparm);
cdev : ccw_device the halt operation is requested for
intparm : interruption parameter; value is only used if no I/O
is outstanding, otherwise the intparm associated with
the I/O request is returned
The ccw_device_halt() function returns :
0 - request successfully initiated
-EBUSY - the device is currently busy, or status pending.
-ENODEV - cdev invalid.
-EINVAL - The device is not operational or the ccw device is not online.
Usage Notes :
A device driver may write a never-ending channel program by writing a channel
program that at its end loops back to its beginning by means of a transfer in
channel (TIC) command (CCW_CMD_TIC). Usually this is performed by network
device drivers by setting the PCI CCW flag (CCW_FLAG_PCI). Once this CCW is
executed a program controlled interrupt (PCI) is generated. The device driver
can then perform an appropriate action. Prior to interrupt of an outstanding
read to a network device (with or without PCI flag) a ccw_device_halt()
is required to end the pending operation.
ccw_device_clear() - Terminage I/O Request Processing
In order to terminate all I/O processing at the subchannel, the clear subchannel
(CSCH) command is used. It can be issued via ccw_device_clear().
ccw_device_clear() must be called disabled and with the ccw device lock held.
int ccw_device_clear(struct ccw_device *cdev, unsigned long intparm);
cdev: ccw_device the clear operation is requested for
intparm: interruption parameter (see ccw_device_halt())
The ccw_device_clear() function returns:
0 - request successfully initiated
-ENODEV - cdev invalid
-EINVAL - The device is not operational or the ccw device is not online.
Miscellaneous Support Routines
This chapter describes various routines to be used in a Linux/390 device
driver programming environment.
get_ccwdev_lock()
Get the address of the device specific lock. This is then used in
spin_lock() / spin_unlock() calls.
__u8 ccw_device_get_path_mask(struct ccw_device *cdev);
Get the mask of the path currently available for cdev.

View File

@ -0,0 +1,76 @@
#!/bin/sh
#
# config3270 -- Autoconfigure /dev/3270/* and /etc/inittab
#
# Usage:
# config3270
#
# Output:
# /tmp/mkdev3270
#
# Operation:
# 1. Run this script
# 2. Run the script it produces: /tmp/mkdev3270
# 3. Issue "telinit q" or reboot, as appropriate.
#
P=/proc/tty/driver/tty3270
ROOT=
D=$ROOT/dev
SUBD=3270
TTY=$SUBD/tty
TUB=$SUBD/tub
SCR=$ROOT/tmp/mkdev3270
SCRTMP=$SCR.a
GETTYLINE=:2345:respawn:/sbin/mingetty
INITTAB=$ROOT/etc/inittab
NINITTAB=$ROOT/etc/NEWinittab
OINITTAB=$ROOT/etc/OLDinittab
ADDNOTE=\\"# Additional mingettys for the 3270/tty* driver, tub3270 ---\\"
if ! ls $P > /dev/null 2>&1; then
modprobe tub3270 > /dev/null 2>&1
fi
ls $P > /dev/null 2>&1 || exit 1
# Initialize two files, one for /dev/3270 commands and one
# to replace the /etc/inittab file (old one saved in OLDinittab)
echo "#!/bin/sh" > $SCR || exit 1
echo " " >> $SCR
echo "# Script built by /sbin/config3270" >> $SCR
if [ ! -d /dev/dasd ]; then
echo rm -rf "$D/$SUBD/*" >> $SCR
fi
echo "grep -v $TTY $INITTAB > $NINITTAB" > $SCRTMP || exit 1
echo "echo $ADDNOTE >> $NINITTAB" >> $SCRTMP
if [ ! -d /dev/dasd ]; then
echo mkdir -p $D/$SUBD >> $SCR
fi
# Now query the tub3270 driver for 3270 device information
# and add appropriate mknod and mingetty lines to our files
echo what=config > $P
while read devno maj min;do
if [ $min = 0 ]; then
fsmaj=$maj
if [ ! -d /dev/dasd ]; then
echo mknod $D/$TUB c $fsmaj 0 >> $SCR
echo chmod 666 $D/$TUB >> $SCR
fi
elif [ $maj = CONSOLE ]; then
if [ ! -d /dev/dasd ]; then
echo mknod $D/$TUB$devno c $fsmaj $min >> $SCR
fi
else
if [ ! -d /dev/dasd ]; then
echo mknod $D/$TTY$devno c $maj $min >>$SCR
echo mknod $D/$TUB$devno c $fsmaj $min >> $SCR
fi
echo "echo t$min$GETTYLINE $TTY$devno >> $NINITTAB" >> $SCRTMP
fi
done < $P
echo mv $INITTAB $OINITTAB >> $SCRTMP || exit 1
echo mv $NINITTAB $INITTAB >> $SCRTMP
cat $SCRTMP >> $SCR
rm $SCRTMP
exit 0

View File

@ -0,0 +1,287 @@
S/390 driver model interfaces
-----------------------------
1. CCW devices
--------------
All devices which can be addressed by means of ccws are called 'CCW devices' -
even if they aren't actually driven by ccws.
All ccw devices are accessed via a subchannel, this is reflected in the
structures under devices/:
devices/
- system/
- css0/
- 0.0.0000/0.0.0815/
- 0.0.0001/0.0.4711/
- 0.0.0002/
- 0.1.0000/0.1.1234/
...
- defunct/
In this example, device 0815 is accessed via subchannel 0 in subchannel set 0,
device 4711 via subchannel 1 in subchannel set 0, and subchannel 2 is a non-I/O
subchannel. Device 1234 is accessed via subchannel 0 in subchannel set 1.
The subchannel named 'defunct' does not represent any real subchannel on the
system; it is a pseudo subchannel where disconnected ccw devices are moved to
if they are displaced by another ccw device becoming operational on their
former subchannel. The ccw devices will be moved again to a proper subchannel
if they become operational again on that subchannel.
You should address a ccw device via its bus id (e.g. 0.0.4711); the device can
be found under bus/ccw/devices/.
All ccw devices export some data via sysfs.
cutype: The control unit type / model.
devtype: The device type / model, if applicable.
availability: Can be 'good' or 'boxed'; 'no path' or 'no device' for
disconnected devices.
online: An interface to set the device online and offline.
In the special case of the device being disconnected (see the
notify function under 1.2), piping 0 to online will forcibly delete
the device.
The device drivers can add entries to export per-device data and interfaces.
There is also some data exported on a per-subchannel basis (see under
bus/css/devices/):
chpids: Via which chpids the device is connected.
pimpampom: The path installed, path available and path operational masks.
There also might be additional data, for example for block devices.
1.1 Bringing up a ccw device
----------------------------
This is done in several steps.
a. Each driver can provide one or more parameter interfaces where parameters can
be specified. These interfaces are also in the driver's responsibility.
b. After a. has been performed, if necessary, the device is finally brought up
via the 'online' interface.
1.2 Writing a driver for ccw devices
------------------------------------
The basic struct ccw_device and struct ccw_driver data structures can be found
under include/asm/ccwdev.h.
struct ccw_device {
spinlock_t *ccwlock;
struct ccw_device_private *private;
struct ccw_device_id id;
struct ccw_driver *drv;
struct device dev;
int online;
void (*handler) (struct ccw_device *dev, unsigned long intparm,
struct irb *irb);
};
struct ccw_driver {
struct module *owner;
struct ccw_device_id *ids;
int (*probe) (struct ccw_device *);
int (*remove) (struct ccw_device *);
int (*set_online) (struct ccw_device *);
int (*set_offline) (struct ccw_device *);
int (*notify) (struct ccw_device *, int);
struct device_driver driver;
char *name;
};
The 'private' field contains data needed for internal i/o operation only, and
is not available to the device driver.
Each driver should declare in a MODULE_DEVICE_TABLE into which CU types/models
and/or device types/models it is interested. This information can later be found
in the struct ccw_device_id fields:
struct ccw_device_id {
__u16 match_flags;
__u16 cu_type;
__u16 dev_type;
__u8 cu_model;
__u8 dev_model;
unsigned long driver_info;
};
The functions in ccw_driver should be used in the following way:
probe: This function is called by the device layer for each device the driver
is interested in. The driver should only allocate private structures
to put in dev->driver_data and create attributes (if needed). Also,
the interrupt handler (see below) should be set here.
int (*probe) (struct ccw_device *cdev);
Parameters: cdev - the device to be probed.
remove: This function is called by the device layer upon removal of the driver,
the device or the module. The driver should perform cleanups here.
int (*remove) (struct ccw_device *cdev);
Parameters: cdev - the device to be removed.
set_online: This function is called by the common I/O layer when the device is
activated via the 'online' attribute. The driver should finally
setup and activate the device here.
int (*set_online) (struct ccw_device *);
Parameters: cdev - the device to be activated. The common layer has
verified that the device is not already online.
set_offline: This function is called by the common I/O layer when the device is
de-activated via the 'online' attribute. The driver should shut
down the device, but not de-allocate its private data.
int (*set_offline) (struct ccw_device *);
Parameters: cdev - the device to be deactivated. The common layer has
verified that the device is online.
notify: This function is called by the common I/O layer for some state changes
of the device.
Signalled to the driver are:
* In online state, device detached (CIO_GONE) or last path gone
(CIO_NO_PATH). The driver must return !0 to keep the device; for
return code 0, the device will be deleted as usual (also when no
notify function is registered). If the driver wants to keep the
device, it is moved into disconnected state.
* In disconnected state, device operational again (CIO_OPER). The
common I/O layer performs some sanity checks on device number and
Device / CU to be reasonably sure if it is still the same device.
If not, the old device is removed and a new one registered. By the
return code of the notify function the device driver signals if it
wants the device back: !0 for keeping, 0 to make the device being
removed and re-registered.
int (*notify) (struct ccw_device *, int);
Parameters: cdev - the device whose state changed.
event - the event that happened. This can be one of CIO_GONE,
CIO_NO_PATH or CIO_OPER.
The handler field of the struct ccw_device is meant to be set to the interrupt
handler for the device. In order to accommodate drivers which use several
distinct handlers (e.g. multi subchannel devices), this is a member of ccw_device
instead of ccw_driver.
The handler is registered with the common layer during set_online() processing
before the driver is called, and is deregistered during set_offline() after the
driver has been called. Also, after registering / before deregistering, path
grouping resp. disbanding of the path group (if applicable) are performed.
void (*handler) (struct ccw_device *dev, unsigned long intparm, struct irb *irb);
Parameters: dev - the device the handler is called for
intparm - the intparm which allows the device driver to identify
the i/o the interrupt is associated with, or to recognize
the interrupt as unsolicited.
irb - interruption response block which contains the accumulated
status.
The device driver is called from the common ccw_device layer and can retrieve
information about the interrupt from the irb parameter.
1.3 ccwgroup devices
--------------------
The ccwgroup mechanism is designed to handle devices consisting of multiple ccw
devices, like lcs or ctc.
The ccw driver provides a 'group' attribute. Piping bus ids of ccw devices to
this attributes creates a ccwgroup device consisting of these ccw devices (if
possible). This ccwgroup device can be set online or offline just like a normal
ccw device.
Each ccwgroup device also provides an 'ungroup' attribute to destroy the device
again (only when offline). This is a generic ccwgroup mechanism (the driver does
not need to implement anything beyond normal removal routines).
A ccw device which is a member of a ccwgroup device carries a pointer to the
ccwgroup device in the driver_data of its device struct. This field must not be
touched by the driver - it should use the ccwgroup device's driver_data for its
private data.
To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in
mind that most drivers will need to implement both a ccwgroup and a ccw
driver.
2. Channel paths
-----------------
Channel paths show up, like subchannels, under the channel subsystem root (css0)
and are called 'chp0.<chpid>'. They have no driver and do not belong to any bus.
Please note, that unlike /proc/chpids in 2.4, the channel path objects reflect
only the logical state and not the physical state, since we cannot track the
latter consistently due to lacking machine support (we don't need to be aware
of it anyway).
status - Can be 'online' or 'offline'.
Piping 'on' or 'off' sets the chpid logically online/offline.
Piping 'on' to an online chpid triggers path reprobing for all devices
the chpid connects to. This can be used to force the kernel to re-use
a channel path the user knows to be online, but the machine hasn't
created a machine check for.
type - The physical type of the channel path.
shared - Whether the channel path is shared.
cmg - The channel measurement group.
3. System devices
-----------------
3.1 xpram
---------
xpram shows up under devices/system/ as 'xpram'.
3.2 cpus
--------
For each cpu, a directory is created under devices/system/cpu/. Each cpu has an
attribute 'online' which can be 0 or 1.
4. Other devices
----------------
4.1 Netiucv
-----------
The netiucv driver creates an attribute 'connection' under
bus/iucv/drivers/netiucv. Piping to this attribute creates a new netiucv
connection to the specified host.
Netiucv connections show up under devices/iucv/ as "netiucv<ifnum>". The interface
number is assigned sequentially to the connections defined via the 'connection'
attribute.
user - shows the connection partner.
buffer - maximum buffer size.
Pipe to it to change buffer size.

View File

@ -0,0 +1,125 @@
*** BIG FAT WARNING ***
The kvm module is currently in EXPERIMENTAL state for s390. This means that
the interface to the module is not yet considered to remain stable. Thus, be
prepared that we keep breaking your userspace application and guest
compatibility over and over again until we feel happy with the result. Make sure
your guest kernel, your host kernel, and your userspace launcher are in a
consistent state.
This Documentation describes the unique ioctl calls to /dev/kvm, the resulting
kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from x86.
1. ioctl calls to /dev/kvm
KVM does support the following ioctls on s390 that are common with other
architectures and do behave the same:
KVM_GET_API_VERSION
KVM_CREATE_VM (*) see note
KVM_CHECK_EXTENSION
KVM_GET_VCPU_MMAP_SIZE
Notes:
* KVM_CREATE_VM may fail on s390, if the calling process has multiple
threads and has not called KVM_S390_ENABLE_SIE before.
In addition, on s390 the following architecture specific ioctls are supported:
ioctl: KVM_S390_ENABLE_SIE
args: none
see also: include/linux/kvm.h
This call causes the kernel to switch on PGSTE in the user page table. This
operation is needed in order to run a virtual machine, and it requires the
calling process to be single-threaded. Note that the first call to KVM_CREATE_VM
will implicitly try to switch on PGSTE if the user process has not called
KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads
before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will
observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time
operation, is not reversible, and will persist over the entire lifetime of
the calling process. It does not have any user-visible effect other than a small
performance penalty.
2. ioctl calls to the kvm-vm file descriptor
KVM does support the following ioctls on s390 that are common with other
architectures and do behave the same:
KVM_CREATE_VCPU
KVM_SET_USER_MEMORY_REGION (*) see note
KVM_GET_DIRTY_LOG (**) see note
Notes:
* kvm does only allow exactly one memory slot on s390, which has to start
at guest absolute address zero and at a user address that is aligned on any
page boundary. This hardware "limitation" allows us to have a few unique
optimizations. The memory slot doesn't have to be filled
with memory actually, it may contain sparse holes. That said, with different
user memory layout this does still allow a large flexibility when
doing the guest memory setup.
** KVM_GET_DIRTY_LOG doesn't work properly yet. The user will receive an empty
log. This ioctl call is only needed for guest migration, and we intend to
implement this one in the future.
In addition, on s390 the following architecture specific ioctls for the kvm-vm
file descriptor are supported:
ioctl: KVM_S390_INTERRUPT
args: struct kvm_s390_interrupt *
see also: include/linux/kvm.h
This ioctl is used to submit a floating interrupt for a virtual machine.
Floating interrupts may be delivered to any virtual cpu in the configuration.
Only some interrupt types defined in include/linux/kvm.h make sense when
submitted as floating interrupts. The following interrupts are not considered
to be useful as floating interrupts, and a call to inject them will result in
-EINVAL error code: program interrupts and interprocessor signals. Valid
floating interrupts are:
KVM_S390_INT_VIRTIO
KVM_S390_INT_SERVICE
3. ioctl calls to the kvm-vcpu file descriptor
KVM does support the following ioctls on s390 that are common with other
architectures and do behave the same:
KVM_RUN
KVM_GET_REGS
KVM_SET_REGS
KVM_GET_SREGS
KVM_SET_SREGS
KVM_GET_FPU
KVM_SET_FPU
In addition, on s390 the following architecture specific ioctls for the
kvm-vcpu file descriptor are supported:
ioctl: KVM_S390_INTERRUPT
args: struct kvm_s390_interrupt *
see also: include/linux/kvm.h
This ioctl is used to submit an interrupt for a specific virtual cpu.
Only some interrupt types defined in include/linux/kvm.h make sense when
submitted for a specific cpu. The following interrupts are not considered
to be useful, and a call to inject them will result in -EINVAL error code:
service processor calls and virtio interrupts. Valid interrupt types are:
KVM_S390_PROGRAM_INT
KVM_S390_SIGP_STOP
KVM_S390_RESTART
KVM_S390_SIGP_SET_PREFIX
KVM_S390_INT_EMERGENCY
ioctl: KVM_S390_STORE_STATUS
args: unsigned long
see also: include/linux/kvm.h
This ioctl stores the state of the cpu at the guest real address given as
argument, unless one of the following values defined in include/linux/kvm.h
is given as argument:
KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in
absolute lowcore as defined by the principles of operation
KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in
its prefix page just like the dump tool that comes with zipl. This is useful
to create a system dump for use with lkcdutils or crash.
ioctl: KVM_S390_SET_INITIAL_PSW
args: struct kvm_s390_psw *
see also: include/linux/kvm.h
This ioctl can be used to set the processor status word (psw) of a stopped cpu
prior to running it with KVM_RUN. Note that this call is not required to modify
the psw during sie intercepts that fall back to userspace because struct kvm_run
does contain the psw, and this value is evaluated during reentry of KVM_RUN
after the intercept exit was recognized.
ioctl: KVM_S390_INITIAL_RESET
args: none
see also: include/linux/kvm.h
This ioctl can be used to perform an initial cpu reset as defined by the
principles of operation. The target cpu has to be in stopped state.

View File

@ -0,0 +1,197 @@
Date : 2004-Nov-26
Author: Gerald Schaefer (geraldsc@de.ibm.com)
Linux API for read access to z/VM Monitor Records
=================================================
Description
===========
This item delivers a new Linux API in the form of a misc char device that is
useable from user space and allows read access to the z/VM Monitor Records
collected by the *MONITOR System Service of z/VM.
User Requirements
=================
The z/VM guest on which you want to access this API needs to be configured in
order to allow IUCV connections to the *MONITOR service, i.e. it needs the
IUCV *MONITOR statement in its user entry. If the monitor DCSS to be used is
restricted (likely), you also need the NAMESAVE <DCSS NAME> statement.
This item will use the IUCV device driver to access the z/VM services, so you
need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1.
There are two options for being able to load the monitor DCSS (examples assume
that the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the
location of the monitor DCSS with the Class E privileged CP command Q NSS MAP
(the values BEGPAG and ENDPAG are given in units of 4K pages).
See also "CP Command and Utility Reference" (SC24-6081-00) for more information
on the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning
and Administration" (SC24-6116-00) for more information on DCSSes.
1st option:
-----------
You can use the CP command DEF STOR CONFIG to define a "memory hole" in your
guest virtual storage around the address range of the DCSS.
Example: DEF STOR CONFIG 0.140M 200M.200M
This defines two blocks of storage, the first is 140MB in size an begins at
address 0MB, the second is 200MB in size and begins at address 200MB,
resulting in a total storage of 340MB. Note that the first block should
always start at 0 and be at least 64MB in size.
2nd option:
-----------
Your guest virtual storage has to end below the starting address of the DCSS
and you have to specify the "mem=" kernel parameter in your parmfile with a
value greater than the ending address of the DCSS.
Example: DEF STOR 140M
This defines 140MB storage size for your guest, the parameter "mem=160M" is
added to the parmfile.
User Interface
==============
The char device is implemented as a kernel module named "monreader",
which can be loaded via the modprobe command, or it can be compiled into the
kernel instead. There is one optional module (or kernel) parameter, "mondcss",
to specify the name of the monitor DCSS. If the module is compiled into the
kernel, the kernel parameter "monreader.mondcss=<DCSS NAME>" can be specified
in the parmfile.
The default name for the DCSS is "MONDCSS" if none is specified. In case that
there are other users already connected to the *MONITOR service (e.g.
Performance Toolkit), the monitor DCSS is already defined and you have to use
the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name
of the monitor DCSS, if already defined, and the users connected to the
*MONITOR service.
Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor
DCSS if your z/VM doesn't have one already, you need Class E privileges to
define and save a DCSS.
Example:
--------
modprobe monreader mondcss=MYDCSS
This loads the module and sets the DCSS name to "MYDCSS".
NOTE:
-----
This API provides no interface to control the *MONITOR service, e.g. specify
which data should be collected. This can be done by the CP command MONITOR
(Class E privileged), see "CP Command and Utility Reference".
Device nodes with udev:
-----------------------
After loading the module, a char device will be created along with the device
node /<udev directory>/monreader.
Device nodes without udev:
--------------------------
If your distribution does not support udev, a device node will not be created
automatically and you have to create it manually after loading the module.
Therefore you need to know the major and minor numbers of the device. These
numbers can be found in /sys/class/misc/monreader/dev.
Typing cat /sys/class/misc/monreader/dev will give an output of the form
<major>:<minor>. The device node can be created via the mknod command, enter
mknod <name> c <major> <minor>, where <name> is the name of the device node
to be created.
Example:
--------
# modprobe monreader
# cat /sys/class/misc/monreader/dev
10:63
# mknod /dev/monreader c 10 63
This loads the module with the default monitor DCSS (MONDCSS) and creates a
device node.
File operations:
----------------
The following file operations are supported: open, release, read, poll.
There are two alternative methods for reading: either non-blocking read in
conjunction with polling, or blocking read without polling. IOCTLs are not
supported.
Read:
-----
Reading from the device provides a 12 Byte monitor control element (MCE),
followed by a set of one or more contiguous monitor records (similar to the
output of the CMS utility MONWRITE without the 4K control blocks). The MCE
contains information on the type of the following record set (sample/event
data), the monitor domains contained within it and the start and end address
of the record set in the monitor DCSS. The start and end address can be used
to determine the size of the record set, the end address is the address of the
last byte of data. The start address is needed to handle "end-of-frame" records
correctly (domain 1, record 13), i.e. it can be used to determine the record
start offset relative to a 4K page (frame) boundary.
See "Appendix A: *MONITOR" in the "z/VM Performance" document for a description
of the monitor control element layout. The layout of the monitor records can
be found here (z/VM 5.1): http://www.vm.ibm.com/pubs/mon510/index.html
The layout of the data stream provided by the monreader device is as follows:
...
<0 byte read>
<first MCE> \
<first set of records> |
... |- data set
<last MCE> |
<last set of records> /
<0 byte read>
...
There may be more than one combination of MCE and corresponding record set
within one data set and the end of each data set is indicated by a successful
read with a return value of 0 (0 byte read).
Any received data must be considered invalid until a complete set was
read successfully, including the closing 0 byte read. Therefore you should
always read the complete set into a buffer before processing the data.
The maximum size of a data set can be as large as the size of the
monitor DCSS, so design the buffer adequately or use dynamic memory allocation.
The size of the monitor DCSS will be printed into syslog after loading the
module. You can also use the (Class E privileged) CP command Q NSS MAP to
list all available segments and information about them.
As with most char devices, error conditions are indicated by returning a
negative value for the number of bytes read. In this case, the errno variable
indicates the error condition:
EIO: reply failed, read data is invalid and the application
should discard the data read since the last successful read with 0 size.
EFAULT: copy_to_user failed, read data is invalid and the application should
discard the data read since the last successful read with 0 size.
EAGAIN: occurs on a non-blocking read if there is no data available at the
moment. There is no data missing or corrupted, just try again or rather
use polling for non-blocking reads.
EOVERFLOW: message limit reached, the data read since the last successful
read with 0 size is valid but subsequent records may be missing.
In the last case (EOVERFLOW) there may be missing data, in the first two cases
(EIO, EFAULT) there will be missing data. It's up to the application if it will
continue reading subsequent data or rather exit.
Open:
-----
Only one user is allowed to open the char device. If it is already in use, the
open function will fail (return a negative value) and set errno to EBUSY.
The open function may also fail if an IUCV connection to the *MONITOR service
cannot be established. In this case errno will be set to EIO and an error
message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER
codes are described in the "z/VM Performance" book, Appendix A.
NOTE:
-----
As soon as the device is opened, incoming messages will be accepted and they
will account for the message limit, i.e. opening the device without reading
from it will provoke the "message limit reached" error (EOVERFLOW error code)
eventually.

View File

@ -0,0 +1,656 @@
S390 Debug Feature
==================
files: arch/s390/kernel/debug.c
arch/s390/include/asm/debug.h
Description:
------------
The goal of this feature is to provide a kernel debug logging API
where log records can be stored efficiently in memory, where each component
(e.g. device drivers) can have one separate debug log.
One purpose of this is to inspect the debug logs after a production system crash
in order to analyze the reason for the crash.
If the system still runs but only a subcomponent which uses dbf fails,
it is possible to look at the debug logs on a live system via the Linux
debugfs filesystem.
The debug feature may also very useful for kernel and driver development.
Design:
-------
Kernel components (e.g. device drivers) can register themselves at the debug
feature with the function call debug_register(). This function initializes a
debug log for the caller. For each debug log exists a number of debug areas
where exactly one is active at one time. Each debug area consists of contiguous
pages in memory. In the debug areas there are stored debug entries (log records)
which are written by event- and exception-calls.
An event-call writes the specified debug entry to the active debug
area and updates the log pointer for the active area. If the end
of the active debug area is reached, a wrap around is done (ring buffer)
and the next debug entry will be written at the beginning of the active
debug area.
An exception-call writes the specified debug entry to the log and
switches to the next debug area. This is done in order to be sure
that the records which describe the origin of the exception are not
overwritten when a wrap around for the current area occurs.
The debug areas themselves are also ordered in form of a ring buffer.
When an exception is thrown in the last debug area, the following debug
entries are then written again in the very first area.
There are three versions for the event- and exception-calls: One for
logging raw data, one for text and one for numbers.
Each debug entry contains the following data:
- Timestamp
- Cpu-Number of calling task
- Level of debug entry (0...6)
- Return Address to caller
- Flag, if entry is an exception or not
The debug logs can be inspected in a live system through entries in
the debugfs-filesystem. Under the toplevel directory "s390dbf" there is
a directory for each registered component, which is named like the
corresponding component. The debugfs normally should be mounted to
/sys/kernel/debug therefore the debug feature can be accessed under
/sys/kernel/debug/s390dbf.
The content of the directories are files which represent different views
to the debug log. Each component can decide which views should be
used through registering them with the function debug_register_view().
Predefined views for hex/ascii, sprintf and raw binary data are provided.
It is also possible to define other views. The content of
a view can be inspected simply by reading the corresponding debugfs file.
All debug logs have an actual debug level (range from 0 to 6).
The default level is 3. Event and Exception functions have a 'level'
parameter. Only debug entries with a level that is lower or equal
than the actual level are written to the log. This means, when
writing events, high priority log entries should have a low level
value whereas low priority entries should have a high one.
The actual debug level can be changed with the help of the debugfs-filesystem
through writing a number string "x" to the 'level' debugfs file which is
provided for every debug log. Debugging can be switched off completely
by using "-" on the 'level' debugfs file.
Example:
> echo "-" > /sys/kernel/debug/s390dbf/dasd/level
It is also possible to deactivate the debug feature globally for every
debug log. You can change the behavior using 2 sysctl parameters in
/proc/sys/s390dbf:
There are currently 2 possible triggers, which stop the debug feature
globally. The first possibility is to use the "debug_active" sysctl. If
set to 1 the debug feature is running. If "debug_active" is set to 0 the
debug feature is turned off.
The second trigger which stops the debug feature is a kernel oops.
That prevents the debug feature from overwriting debug information that
happened before the oops. After an oops you can reactivate the debug feature
by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not
suggested to use an oopsed kernel in a production environment.
If you want to disallow the deactivation of the debug feature, you can use
the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug
feature cannot be stopped. If the debug feature is already stopped, it
will stay deactivated.
Kernel Interfaces:
------------------
----------------------------------------------------------------------------
debug_info_t *debug_register(char *name, int pages, int nr_areas,
int buf_size);
Parameter: name: Name of debug log (e.g. used for debugfs entry)
pages: number of pages, which will be allocated per area
nr_areas: number of debug areas
buf_size: size of data area in each debug entry
Return Value: Handle for generated debug area
NULL if register failed
Description: Allocates memory for a debug log
Must not be called within an interrupt handler
----------------------------------------------------------------------------
debug_info_t *debug_register_mode(char *name, int pages, int nr_areas,
int buf_size, mode_t mode, uid_t uid,
gid_t gid);
Parameter: name: Name of debug log (e.g. used for debugfs entry)
pages: Number of pages, which will be allocated per area
nr_areas: Number of debug areas
buf_size: Size of data area in each debug entry
mode: File mode for debugfs files. E.g. S_IRWXUGO
uid: User ID for debugfs files. Currently only 0 is
supported.
gid: Group ID for debugfs files. Currently only 0 is
supported.
Return Value: Handle for generated debug area
NULL if register failed
Description: Allocates memory for a debug log
Must not be called within an interrupt handler
---------------------------------------------------------------------------
void debug_unregister (debug_info_t * id);
Parameter: id: handle for debug log
Return Value: none
Description: frees memory for a debug log
Must not be called within an interrupt handler
---------------------------------------------------------------------------
void debug_set_level (debug_info_t * id, int new_level);
Parameter: id: handle for debug log
new_level: new debug level
Return Value: none
Description: Sets new actual debug level if new_level is valid.
---------------------------------------------------------------------------
void debug_stop_all(void);
Parameter: none
Return Value: none
Description: stops the debug feature if stopping is allowed. Currently
used in case of a kernel oops.
---------------------------------------------------------------------------
debug_entry_t* debug_event (debug_info_t* id, int level, void* data,
int length);
Parameter: id: handle for debug log
level: debug level
data: pointer to data for debug entry
length: length of data in bytes
Return Value: Address of written debug entry
Description: writes debug entry to active debug area (if level <= actual
debug level)
---------------------------------------------------------------------------
debug_entry_t* debug_int_event (debug_info_t * id, int level,
unsigned int data);
debug_entry_t* debug_long_event(debug_info_t * id, int level,
unsigned long data);
Parameter: id: handle for debug log
level: debug level
data: integer value for debug entry
Return Value: Address of written debug entry
Description: writes debug entry to active debug area (if level <= actual
debug level)
---------------------------------------------------------------------------
debug_entry_t* debug_text_event (debug_info_t * id, int level,
const char* data);
Parameter: id: handle for debug log
level: debug level
data: string for debug entry
Return Value: Address of written debug entry
Description: writes debug entry in ascii format to active debug area
(if level <= actual debug level)
---------------------------------------------------------------------------
debug_entry_t* debug_sprintf_event (debug_info_t * id, int level,
char* string,...);
Parameter: id: handle for debug log
level: debug level
string: format string for debug entry
...: varargs used as in sprintf()
Return Value: Address of written debug entry
Description: writes debug entry with format string and varargs (longs) to
active debug area (if level $<=$ actual debug level).
floats and long long datatypes cannot be used as varargs.
---------------------------------------------------------------------------
debug_entry_t* debug_exception (debug_info_t* id, int level, void* data,
int length);
Parameter: id: handle for debug log
level: debug level
data: pointer to data for debug entry
length: length of data in bytes
Return Value: Address of written debug entry
Description: writes debug entry to active debug area (if level <= actual
debug level) and switches to next debug area
---------------------------------------------------------------------------
debug_entry_t* debug_int_exception (debug_info_t * id, int level,
unsigned int data);
debug_entry_t* debug_long_exception(debug_info_t * id, int level,
unsigned long data);
Parameter: id: handle for debug log
level: debug level
data: integer value for debug entry
Return Value: Address of written debug entry
Description: writes debug entry to active debug area (if level <= actual
debug level) and switches to next debug area
---------------------------------------------------------------------------
debug_entry_t* debug_text_exception (debug_info_t * id, int level,
const char* data);
Parameter: id: handle for debug log
level: debug level
data: string for debug entry
Return Value: Address of written debug entry
Description: writes debug entry in ascii format to active debug area
(if level <= actual debug level) and switches to next debug
area
---------------------------------------------------------------------------
debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level,
char* string,...);
Parameter: id: handle for debug log
level: debug level
string: format string for debug entry
...: varargs used as in sprintf()
Return Value: Address of written debug entry
Description: writes debug entry with format string and varargs (longs) to
active debug area (if level $<=$ actual debug level) and
switches to next debug area.
floats and long long datatypes cannot be used as varargs.
---------------------------------------------------------------------------
int debug_register_view (debug_info_t * id, struct debug_view *view);
Parameter: id: handle for debug log
view: pointer to debug view struct
Return Value: 0 : ok
< 0: Error
Description: registers new debug view and creates debugfs dir entry
---------------------------------------------------------------------------
int debug_unregister_view (debug_info_t * id, struct debug_view *view);
Parameter: id: handle for debug log
view: pointer to debug view struct
Return Value: 0 : ok
< 0: Error
Description: unregisters debug view and removes debugfs dir entry
Predefined views:
-----------------
extern struct debug_view debug_hex_ascii_view;
extern struct debug_view debug_raw_view;
extern struct debug_view debug_sprintf_view;
Examples
--------
/*
* hex_ascii- + raw-view Example
*/
#include <linux/init.h>
#include <asm/debug.h>
static debug_info_t* debug_info;
static int init(void)
{
/* register 4 debug areas with one page each and 4 byte data field */
debug_info = debug_register ("test", 1, 4, 4 );
debug_register_view(debug_info,&debug_hex_ascii_view);
debug_register_view(debug_info,&debug_raw_view);
debug_text_event(debug_info, 4 , "one ");
debug_int_exception(debug_info, 4, 4711);
debug_event(debug_info, 3, &debug_info, 4);
return 0;
}
static void cleanup(void)
{
debug_unregister (debug_info);
}
module_init(init);
module_exit(cleanup);
---------------------------------------------------------------------------
/*
* sprintf-view Example
*/
#include <linux/init.h>
#include <asm/debug.h>
static debug_info_t* debug_info;
static int init(void)
{
/* register 4 debug areas with one page each and data field for */
/* format string pointer + 2 varargs (= 3 * sizeof(long)) */
debug_info = debug_register ("test", 1, 4, sizeof(long) * 3);
debug_register_view(debug_info,&debug_sprintf_view);
debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__);
debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info);
return 0;
}
static void cleanup(void)
{
debug_unregister (debug_info);
}
module_init(init);
module_exit(cleanup);
Debugfs Interface
----------------
Views to the debug logs can be investigated through reading the corresponding
debugfs-files:
Example:
> ls /sys/kernel/debug/s390dbf/dasd
flush hex_ascii level pages raw
> cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort +1
00 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | ....
00 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE
00 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | ....
00 00974733272:682281 1 * 02 0006ab08 41 4c 4c 43 | EXCP
01 00974733272:682284 2 - 02 0006ab16 45 43 4b 44 | ECKD
01 00974733272:682287 2 - 02 0006ab28 00 00 00 04 | ....
01 00974733272:682289 2 - 02 0006ab3e 00 00 00 20 | ...
01 00974733272:682297 2 - 02 0006ad7e 07 ea 4a 90 | ....
01 00974733272:684384 2 - 00 0006ade6 46 52 45 45 | FREE
01 00974733272:684388 2 - 00 0006adf6 07 ea 4a 90 | ....
See section about predefined views for explanation of the above output!
Changing the debug level
------------------------
Example:
> cat /sys/kernel/debug/s390dbf/dasd/level
3
> echo "5" > /sys/kernel/debug/s390dbf/dasd/level
> cat /sys/kernel/debug/s390dbf/dasd/level
5
Flushing debug areas
--------------------
Debug areas can be flushed with piping the number of the desired
area (0...n) to the debugfs file "flush". When using "-" all debug areas
are flushed.
Examples:
1. Flush debug area 0:
> echo "0" > /sys/kernel/debug/s390dbf/dasd/flush
2. Flush all debug areas:
> echo "-" > /sys/kernel/debug/s390dbf/dasd/flush
Changing the size of debug areas
------------------------------------
It is possible the change the size of debug areas through piping
the number of pages to the debugfs file "pages". The resize request will
also flush the debug areas.
Example:
Define 4 pages for the debug areas of debug feature "dasd":
> echo "4" > /sys/kernel/debug/s390dbf/dasd/pages
Stooping the debug feature
--------------------------
Example:
1. Check if stopping is allowed
> cat /proc/sys/s390dbf/debug_stoppable
2. Stop debug feature
> echo 0 > /proc/sys/s390dbf/debug_active
lcrash Interface
----------------
It is planned that the dump analysis tool lcrash gets an additional command
's390dbf' to display all the debug logs. With this tool it will be possible
to investigate the debug logs on a live system and with a memory dump after
a system crash.
Investigating raw memory
------------------------
One last possibility to investigate the debug logs at a live
system and after a system crash is to look at the raw memory
under VM or at the Service Element.
It is possible to find the anker of the debug-logs through
the 'debug_area_first' symbol in the System map. Then one has
to follow the correct pointers of the data-structures defined
in debug.h and find the debug-areas in memory.
Normally modules which use the debug feature will also have
a global variable with the pointer to the debug-logs. Following
this pointer it will also be possible to find the debug logs in
memory.
For this method it is recommended to use '16 * x + 4' byte (x = 0..n)
for the length of the data field in debug_register() in
order to see the debug entries well formatted.
Predefined Views
----------------
There are three predefined views: hex_ascii, raw and sprintf.
The hex_ascii view shows the data field in hex and ascii representation
(e.g. '45 43 4b 44 | ECKD').
The raw view returns a bytestream as the debug areas are stored in memory.
The sprintf view formats the debug entries in the same way as the sprintf
function would do. The sprintf event/exception functions write to the
debug entry a pointer to the format string (size = sizeof(long))
and for each vararg a long value. So e.g. for a debug entry with a format
string plus two varargs one would need to allocate a (3 * sizeof(long))
byte data area in the debug_register() function.
IMPORTANT: Using "%s" in sprintf event functions is dangerous. You can only
use "%s" in the sprintf event functions, if the memory for the passed string is
available as long as the debug feature exists. The reason behind this is that
due to performance considerations only a pointer to the string is stored in
the debug feature. If you log a string that is freed afterwards, you will get
an OOPS when inspecting the debug feature, because then the debug feature will
access the already freed memory.
NOTE: If using the sprintf view do NOT use other event/exception functions
than the sprintf-event and -exception functions.
The format of the hex_ascii and sprintf view is as follows:
- Number of area
- Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated
Universal Time (UTC), January 1, 1970)
- level of debug entry
- Exception flag (* = Exception)
- Cpu-Number of calling task
- Return Address to caller
- data field
The format of the raw view is:
- Header as described in debug.h
- datafield
A typical line of the hex_ascii view will look like the following (first line
is only for explanation and will not be displayed when 'cating' the view):
area time level exception cpu caller data (hex + ascii)
--------------------------------------------------------------------------
00 00964419409:440690 1 - 00 88023fe
Defining views
--------------
Views are specified with the 'debug_view' structure. There are defined
callback functions which are used for reading and writing the debugfs files:
struct debug_view {
char name[DEBUG_MAX_PROCF_LEN];
debug_prolog_proc_t* prolog_proc;
debug_header_proc_t* header_proc;
debug_format_proc_t* format_proc;
debug_input_proc_t* input_proc;
void* private_data;
};
where
typedef int (debug_header_proc_t) (debug_info_t* id,
struct debug_view* view,
int area,
debug_entry_t* entry,
char* out_buf);
typedef int (debug_format_proc_t) (debug_info_t* id,
struct debug_view* view, char* out_buf,
const char* in_buf);
typedef int (debug_prolog_proc_t) (debug_info_t* id,
struct debug_view* view,
char* out_buf);
typedef int (debug_input_proc_t) (debug_info_t* id,
struct debug_view* view,
struct file* file, const char* user_buf,
size_t in_buf_size, loff_t* offset);
The "private_data" member can be used as pointer to view specific data.
It is not used by the debug feature itself.
The output when reading a debugfs file is structured like this:
"prolog_proc output"
"header_proc output 1" "format_proc output 1"
"header_proc output 2" "format_proc output 2"
"header_proc output 3" "format_proc output 3"
...
When a view is read from the debugfs, the Debug Feature calls the
'prolog_proc' once for writing the prolog.
Then 'header_proc' and 'format_proc' are called for each
existing debug entry.
The input_proc can be used to implement functionality when it is written to
the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level).
For header_proc there can be used the default function
debug_dflt_header_fn() which is defined in debug.h.
and which produces the same header output as the predefined views.
E.g:
00 00964419409:440761 2 - 00 88023ec
In order to see how to use the callback functions check the implementation
of the default views!
Example
#include <asm/debug.h>
#define UNKNOWNSTR "data: %08x"
const char* messages[] =
{"This error...........\n",
"That error...........\n",
"Problem..............\n",
"Something went wrong.\n",
"Everything ok........\n",
NULL
};
static int debug_test_format_fn(
debug_info_t * id, struct debug_view *view,
char *out_buf, const char *in_buf
)
{
int i, rc = 0;
if(id->buf_size >= 4) {
int msg_nr = *((int*)in_buf);
if(msg_nr < sizeof(messages)/sizeof(char*) - 1)
rc += sprintf(out_buf, "%s", messages[msg_nr]);
else
rc += sprintf(out_buf, UNKNOWNSTR, msg_nr);
}
out:
return rc;
}
struct debug_view debug_test_view = {
"myview", /* name of view */
NULL, /* no prolog */
&debug_dflt_header_fn, /* default header for each entry */
&debug_test_format_fn, /* our own format function */
NULL, /* no input function */
NULL /* no private data */
};
=====
test:
=====
debug_info_t *debug_info;
...
debug_info = debug_register ("test", 0, 4, 4 ));
debug_register_view(debug_info, &debug_test_view);
for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i);
> cat /sys/kernel/debug/s390dbf/test/myview
00 00964419734:611402 1 - 00 88042ca This error...........
00 00964419734:611405 1 - 00 88042ca That error...........
00 00964419734:611408 1 - 00 88042ca Problem..............
00 00964419734:611411 1 - 00 88042ca Something went wrong.
00 00964419734:611414 1 - 00 88042ca Everything ok........
00 00964419734:611417 1 - 00 88042ca data: 00000005
00 00964419734:611419 1 - 00 88042ca data: 00000006
00 00964419734:611422 1 - 00 88042ca data: 00000007
00 00964419734:611425 1 - 00 88042ca data: 00000008
00 00964419734:611428 1 - 00 88042ca data: 00000009

View File

@ -0,0 +1,87 @@
s390 SCSI dump tool (zfcpdump)
System z machines (z900 or higher) provide hardware support for creating system
dumps on SCSI disks. The dump process is initiated by booting a dump tool, which
has to create a dump of the current (probably crashed) Linux image. In order to
not overwrite memory of the crashed Linux with data of the dump tool, the
hardware saves some memory plus the register sets of the boot cpu before the
dump tool is loaded. There exists an SCLP hardware interface to obtain the saved
memory afterwards. Currently 32 MB are saved.
This zfcpdump implementation consists of a Linux dump kernel together with
a userspace dump tool, which are loaded together into the saved memory region
below 32 MB. zfcpdump is installed on a SCSI disk using zipl (as contained in
the s390-tools package) to make the device bootable. The operator of a Linux
system can then trigger a SCSI dump by booting the SCSI disk, where zfcpdump
resides on.
The kernel part of zfcpdump is implemented as a debugfs file under "zcore/mem",
which exports memory and registers of the crashed Linux in an s390
standalone dump format. It can be used in the same way as e.g. /dev/mem. The
dump format defines a 4K header followed by plain uncompressed memory. The
register sets are stored in the prefix pages of the respective cpus. To build a
dump enabled kernel with the zcore driver, the kernel config option
CONFIG_ZFCPDUMP has to be set. When reading from "zcore/mem", the part of
memory, which has been saved by hardware is read by the driver via the SCLP
hardware interface. The second part is just copied from the non overwritten real
memory.
The userspace application of zfcpdump can reside e.g. in an intitramfs or an
initrd. It reads from zcore/mem and writes the system dump to a file on a
SCSI disk.
To build a zfcpdump kernel use the following settings in your kernel
configuration:
* CONFIG_ZFCPDUMP=y
* Enable ZFCP driver
* Enable SCSI driver
* Enable ext2 and ext3 filesystems
* Disable as many features as possible to keep the kernel small.
E.g. network support is not needed at all.
To use the zfcpdump userspace application in an initramfs you have to do the
following:
* Copy the zfcpdump executable somewhere into your Linux tree.
E.g. to "arch/s390/boot/zfcpdump. If you do not want to include
shared libraries, compile the tool with the "-static" gcc option.
* If you want to include e2fsck, add it to your source tree, too. The zfcpdump
application attempts to start /sbin/e2fsck from the ramdisk.
* Use an initramfs config file like the following:
dir /dev 755 0 0
nod /dev/console 644 0 0 c 5 1
nod /dev/null 644 0 0 c 1 3
nod /dev/sda1 644 0 0 b 8 1
nod /dev/sda2 644 0 0 b 8 2
nod /dev/sda3 644 0 0 b 8 3
nod /dev/sda4 644 0 0 b 8 4
nod /dev/sda5 644 0 0 b 8 5
nod /dev/sda6 644 0 0 b 8 6
nod /dev/sda7 644 0 0 b 8 7
nod /dev/sda8 644 0 0 b 8 8
nod /dev/sda9 644 0 0 b 8 9
nod /dev/sda10 644 0 0 b 8 10
nod /dev/sda11 644 0 0 b 8 11
nod /dev/sda12 644 0 0 b 8 12
nod /dev/sda13 644 0 0 b 8 13
nod /dev/sda14 644 0 0 b 8 14
nod /dev/sda15 644 0 0 b 8 15
file /init arch/s390/boot/zfcpdump 755 0 0
file /sbin/e2fsck arch/s390/boot/e2fsck 755 0 0
dir /proc 755 0 0
dir /sys 755 0 0
dir /mnt 755 0 0
dir /sbin 755 0 0
* Issue "make image" to build the zfcpdump image with initramfs.
In a Linux distribution the zfcpdump enabled kernel image must be copied to
/usr/share/zfcpdump/zfcpdump.image, where the s390 zipl tool is looking for the
dump kernel when preparing a SCSI dump disk.
If you use a ramdisk copy it to "/usr/share/zfcpdump/zfcpdump.rd".
For more information on how to use zfcpdump refer to the s390 'Using the Dump
Tools book', which is available from
http://www.ibm.com/developerworks/linux/linux390.