Operating Systems

CT 315 Operating Systems, Sem VI, Year 2013.

Instructor

Abhijit A.M. is the instructor.

Syllabus

Scheme of Evaluation

Compiling Linux Kernel

ADDING SYSTEM-CALLS TO THE KERNEL AND THEN COMPILING IT.
All the process is carried out in Guest OS.
1.Install Virtual-box.

       $sudo apt-get install virtual-box

2.Install ubuntu OS in the virtual-box.(make the appropriate correct partitions while installing ubuntu)

3.Download any current stable version of linux kernel file i.e linux-3.13.1.tar.xz file and move to any folder like home, /usr/src etc.
4.Extract the tar.xz file using

       $tar xvf file_name.

5.Go to syscalls folder and open syscalls32.tbl file

       $cd linux-3.13.1/arch/x86/syscalls/ 

       $vim syscalls32.tbl

6.Add the system call at the end of the file through following line.

       eg. 351 i386 mycall sys_mycall  /*here sys_mycall is the name of the system call to be added*/

7.open syscalls.h file

       $vim include/linux/syscalls.h

8.add the following prototype in the .h file

       asmlinkage int sys_mycall(void);

9.open fork.c file in kernel folder.

       $cd linux-3.13.1/kernel/ 
       $vim fork.c

10.Add the following function at the end of the fork.c file i.e it is the function of the syscall which we want to add in the kernel.

       asmlinkage int sys_mycall(void)
       {
              return 211; /*we can write printf statements also but syntax are different as it is a kernel file*/
       }

Instead of adding function in the fork.c file , we can make our .c file in the kernel folder and then write the function in it. In addition to this, we have to add makefile and run it.

11.Make "try.c" file in home folder and write our code which will be used to check whether syscall is added or not. A substitute to embedding assembly code in the C program is to use the syscall(int number, ...) system call which allows us to invoke our own system call and pass arguments to it. Here, the "number" refers to the system call number you have added in the syscalls32.tbl file.

        /* try.c */
        #include<stdio.h>
        #include<stdlib.h>
        int my_call(){
             int rel;
             __asm__("movl $350, %eax"); /*instead of 350 you write your added system call number*/
             __asm__("int $0x80");
             __asm__("movl %eax, -4(%ebp)");
              /*instead of the above 3 lines we can use syscall(350); (assuming 350 is you added system call number) */
             return rel;
        }
        int main()
        {
      	      int rel;
             printf("invoking system call\n");
             rel = my_call();
             printf("%d", rel);
             if(rel < 0)
                   exit(1);
             return 0;
        }

12.Now compile the kernel.

       $ cd ~/linux-3.13.1
       $ make defconfig  /*it will generate the .config file*/
       $ make -j6 /*so that it will execute 6files at a time*/  
       $ sudo make modules_install
       $ sudo make install

13.reboot the system

       $sudo reboot

14.now compile and execute the try.c file

       $cc try.c -o try
       $./try

output :- 211. (this shows syscall has been added to the kernel-3.13.1)

note:-whenever any changes and updation is made in the kernel, every time we have to compile and reboot it.

Debugfs

debugfs is a file system debugger tool. To know ext2 file system debugging, Try following command: First of all, format your pendrive :

$mount                                 //it attaches file system found on some device to 
                                         the big file tree
$umount /media/pendrive_name/ 
$sudo mkfs -t ext2 /dev/sdb1            /create ext2 filesystem in your pendrive 
$mkdir /tmp/x                           //create a directory
$sudo mount -t ext2 /dev/sdb1 /tmp/x    //ext2 filesystem in /tmp/x
$ls /tmp/x                              //see content of /tmp/x
$cp /etc/passwd /tmp/x                  //after copying passwd file to /tmp/x
$ls -li /tmp/x
$sudo debugfs /dev/sdb1                 //debugfs tool for your device file
debugfs :stats
debugfs :stat /passwd
debugfs :stat /lost+found

//stat <filename> will give blockcount, linkcount, inode etc.information related to file //try 'sync' command to synchronize data on disk with memory for any inconsistencies

Virtual Machine Setup

Write your tutorials for setting up various virtual machines using the links given below. Click on the link and you can start working on the page.

strace on ./helloworld"

Write the meaning of output of "strace helloworld" shown on stderr here.

If we write helloworld program in C
<source lang="c" style="overflow:auto">

              #include <stdio.h>
              int main() {
                         printf("Hello world!\n");
                         return 0;
              }

</source> After compiling it, do on prompt > strace ./helloworld
It will give you all the system calls made by helloworld program.

Output of $ strace ./hello

execve("./hello", ["./hello"], [/* 46 vars */]) = 0
brk(0)   = 0x8966000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb774a000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=87978, ...}) =0
mmap2(NULL, 87978, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7734000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1713640, ...}) = 0
mmap2(NULL, 1723100, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb758f000
mmap2(0xb772e000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19f) = 0xb772e000
mmap2(0xb7731000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7731000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb758e000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb758e900, limit:1048575, seg_32bit:1, contents:0, 
read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb772e000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0xb776d000, 4096, PROT_READ)   = 0
munmap(0xb7734000, 87978)               = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev = makedev(136, 3), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7749000
write(1, "Hello world!\n", 13Hello world!)          = 13
exit_group(13)

Each line of the o/p is a system call along with the arguments.The values to the right of the system calls indicate their return values.

strace

strace is a tool to trace the system calls and signals. strace is used as a diagnostic and debugging tool. It runs the specified command until it exists. Tracing system calls using strace help in bug isolation and capturing race conditions. It runs the specified command and records the system calls made by the process as well as signals received by the process. It can trace all the system calls of programs even when the source code is not available. Trace is helpful when we are getting some errors continuously in our program and if our program is not doing what it actually meant to be or if our program crashes continually.

Generate Statistics Report of System Calls Using Option -c

Using option -c, strace provides useful statistical report for the execution trace. The “calls” column in the following output indicated how many times that particular system call was executed.

Hello World% time seconds usecs/call calls errors syscall

----------- ----------- --------- --------- ----------------
-nan 0.000000 0 1 read
-nan 0.000000 0 1 write
-nan 0.000000 0 2 open
-nan 0.000000 0 2 close
-nan 0.000000 0 1 execve
-nan 0.000000 0 3 3 access
-nan 0.000000 0 1 brk
-nan 0.000000 0 1 munmap
-nan 0.000000 0 4 mprotect
-nan 0.000000 0 7 mmap2
-nan 0.000000 0 3 fstat64
-nan 0.000000 0 1 set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 27 3 total

Sequencing and meaning of signals in strace

1.execve("./hello", ["./hello"], [/* 46 vars */]):
Basically this signal is used for execution of the file which is passed as an argument to it.(here,"./hello").The third argument passed to the signal denotes the number of environment variables(set of values that affect the way running processes behave on a computer. For eg date,time,random) supported by the system(here 46).The list of these variables can be checked by making use of environ variable.

2.brk(0) = 0x8966000:
brk signal is used to change the size of the data segment or we can say that to change the program break which points to the end of data segment.Here it is used with an incremental zero,which gives the current value of the program break i.e. current end of data segment.

3.access("/etc/ld.so.nohwcap", F_OK):
access() checks whether the calling process can access the file given as the first argument. ld.so loads the shared libraries needed by a program,prepares the program to run and then runs it.
/etc/ld.so.nohwcap file is used for dynamic linking of the program whenever the program is incomplete(mostly links the libraries). The second argument specifies the accessibility check(s) to be performed.(here "F_OK". F_OK tests for the existence of the file.) On success 0 is returned while on failure -1 is returned and errno is set accordingly( here "ENOENT" which indicates a component of "/etc/ld.so.nohwcap" does not exist or is a dangling symbolic link.) Here we don't need any linking to our program.Thus we can avoid these unnecessary access checks by using -static command while compiling the program.This improves the launch time.

4.mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb774a000:
This signal is used to map memory for files and devices.The first argument is NULL which means that the kernel will decide the address from where mapping is to be started.The next is the size of the memory which is mapped by this signal.This memory is read-write memory.The flags are kept private as well as anonymous so that we can take backup of this memory if required.The next argument is fd which is kept -1 to support some applications supported by anonymous flag.The last argument is offset.which is 0 here.

5.access("/etc/ld.so.preload", R_OK) :
.preload file is used to override functions in some other libraries. This is called preloading a library (preloading a library means that its functions will be used before others of the same name in later libraries). R_OK checks if the file exists and grants the permission to read if it does. Again this is returning -1 with errorno as ENOENT indicating that a component of "/etc/ld.so.preload" does not exist. We don't need this file too. So this access check can also be avoided by using -static.

6.open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3:
ld.so.cache file contains an ordered list of libraries found in the directories specified in /etc/ld.so.conf, which shows the list of directories. Thus this file contains the list of all the libraries and when a program requires a library, this file gives the library (i.e. if it contains that library) to ld.so which then loads the library and makes it available to the requesting program.

student@foss-01:~$ cat /etc/ld.so.conf.d/*
/usr/lib/i386-linux-gnu/mesa-egl
/usr/lib/i386-linux-gnu/mesa
Multiarch support
/lib/i386-linux-gnu
/usr/lib/i386-linux-gnu
/lib/i686-linux-gnu
/usr/lib/i686-linux-gnu
libc default configuration
/usr/local/lib
cache file is opened to check the libraries supported by the system.This is binary file.

7.fstat64(3, {st_mode=S_IFREG|0644, st_size=87978, ...}) =0
This signal provides information about the file whose fd is passed to it as an argument.Here it gives information about the .cache files.This signal is used here to get the size of the .cache file so that it can map that much memory using mmap2.st_mode=S_IFREG states that the file is a regular file.It returns 0 on success.

8.mmap2(NULL, 87978, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7734000:
This signal is used to map 87978 bytes of memory for .cache file.It is mapped from the address b7734000.This area is read only memory so that no one can change the contents oh this file.The flag is also kept private so that backup of this memory can be taken.

9.close(3):
This signal is used to close the file having fd 3 i.e. .cache file.

10. open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
Libc file shows which files are in C library.
i386 specifies which processor.
linux specifies which operating system.
gnu is the compiler which compile programs using gcc.
lib/i386-linux-gnu/libc.so.6 shows all standard libraries that are used by nearly all programs.
And the second argument is to open these libraries as read only and can also enable multithreading.
So here the system call open is to open those libraries and load them in file descriptor 3.

11.read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
This system call is made to read some binary file upto 512 bytes into file descriptor 3. As it is binary file, it can not be read by user. And as part of return it returns those 512 bytes.

12. fstat64(3, {st_mode=S_IFREG|0755, st_size=1713640, ...}) = 0
This call provides the information of the file descriptor 3. Here it gives information about the .cache files. It is used here to get the size of that .cache file so that it can map that much area using mmap2. Second argument states that file is a regular file.
st_mode contains file attributes.
S_IFREG marks a file as a regular one. Then it is binary added to 0755 value, being normal file permissions and is written in octal format.
Final argument is the size.

13.mmap2(NULL, 1723100, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb758f000 First argument NULL specifies kernel will decide the address from where mapping is to be done.
Second argument maps 1723100 size of memory.
Third argument shows this memory is readable and executable.
Fourth shows that the flags are kept private as well as they cannot be rewritten. They are usually ignored.
Next argument gives the file descriptor value which is 3.
Last argument gives offset value referring fd means to start from 0th position.
Return value is the address from where the mapping is to be started.

14. mmap2(0xb772e000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19f) = 0xb772e000
Here the first argument is the address from where mapping is to be started. That is received by adding size of above mmap2 to address returned by it.
Second gives size that is to be mapped from this address.
Third argument tells memory it is readable and writeable.
Fourth argument tells rights for flags. They are private. MAP_FIXED tells they map exactly at that address. MAP_DENYWRITE tells that flags can be ignored.
Then the file descriptor value 3. Last argument tells to map from that address.

15. mmap2(0xb7731000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7731000
It is same as above with given address and size of memory. The memory is readable and writeable. Flags are private and tells to map exactly at given address. They can also be anonymous that is we can take back up of that file if required. Next argument is is fd. It is set to -1 because of MAP_ANONYMOUS (it is necessary to keep fd as -1 for some purpose).

16. close(3)
This signal is used to close the file with file descriptor 3 which is .cache file.

17. mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb758e000
Here "NULL" indicating that the kernel will select the address at which to create the mapping. The second argument specifies the length (in bytes) of the mapping(here "4096"). Here the pages may be read or written. The third argument is used to determine if updates to the mapping will be visible to the other processes which are mapping to the same region and if they are carried through to the underlying file.(here bitwise ORing of MAP_PRIVATE and MAP_ANONYMOUS. MAP_PRIVATE shows that updates to the mapping will not be visible to the other processes mapping to the same region and that they are not carried through to the underlying file. MAP_ANONYMOUS shows that the mapping maps an area of the process's virtual memory which is not backed by any file.) The fourth argument is the file descriptor of the file to be mapped(here "-1" because MAP_ANONYMOUS is used). The fifth argument is the offset into the file to be mapped in 4096-byte units. The offset is constrained to be aligned on a page boundary. The return value is the address at which the mapping was placed.

18. set_thread_area({entry_number:-1 -> 6, base_addr:0xb77e06c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
set_thread_area() sets an entry in the current thread's Thread Local Storage (TLS) array. The TLS array entry set by set_thread_area() corresponds to the value of u_info->entry_number passed in by the user.If this value is in bounds, set_thread_area() copies the TLS descriptor pointed to by u_info into the thread's TLS array.When set_thread_area() is passed an entry_number of -1, it uses a free TLS entry. If set_thread_area() finds a free TLS entry, the value of u_info->entry_number is set upon return to show which entry was changed.

19. mprotect(0xb772e000, 8192, PROT_READ) = 0
mprotect() is basically used to change the protection rights of the memory mapping. The first argument shows the starting address from where the protection rights are to be changed.(here "0xb772e000"). The second argument indicates the length(here "8192"). The pages of the process' address space acquired through the mmap2() used earlier get affected. All the pages starting from the start address till (start address + length -1) change their protection rights to that specified by the third argument(here "PROT_READ"). So these concerned pages are now allowed to be read. Since it is successful in changing the protection rights, it returns 0.

20. munmap(0xb7734000, 87978) = 0
While mmap2() is used to establish a mapping, mumap() does exactly the reverse. The first argument is the starting address(here "0xb7734000" that was allocated using the second mmap2()). The second argument is the length. Thus munmap() removes any mapping for entire pages starting from the start address upto the length. Any future reference to this memory will result in SIGSEGV signal to the process. Since the memory space it is trying to remove was defined as PRIVATE in the mmap2() call, modifications made in this address range are discarded. It returns 0 as it is successful.

System calls

System calls are implemented in the Linux kernel. A system call is not an ordinary function call, a special procedure is required to transfer control to the kernel. Currently in Linux there are about 200 system calls. You can see the list of system calls in "/usr/include/asm-generic/unistd.h".

execve()

execve("./hello", ["./hello", "-o", "output"], [/* 45 vars */]) = 0

 int execve(const char *filename, char *const argv[],char *const envp[]);

As we know that, when we type in a command or a object file to be executed, the shell spawns a child shell (we can say child process) and this child shell performs the execution. This is done by system call 'execve'. The first argument to the execve is the binary executable file. The second and the third argument is an array of strings passed to the new program. The first of this strings is file to be executed. The third argument is of the form key = value, is used to create the operating environment for the program.
On success execve() does not return and in error case -1 is returned and errno is set accordingly.

brk()

brk(0) = 0x9e7c000

 int brk(void *addr);

brk() is used to change the size of the process's data segment. brk() actually changes the program break, which point's the end of the process's data segment. brk() sets the end of data segment to the value specified by addr, this addr is the first location beyond end of data segment . However, this increment to process's data segment's size is subject to the availability of memory, maximum limit of program's data size, etc.
brk() returns 0 on success. On error, brk() sets errno and returns -1. However, the actual Linux system call returns the new program break on success and current break on failure. In the above example, the current program value is returned.

mmap2()

mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb774a000

   void *mmap2(void *addr, size_t length, int prot, int flags, int fd, off_t pgoffset);

Memory mapping is a process in which a portion of file or the entire file on disk, is mapped to the address space of the application. This is done in such a way that it can be accessed as if it were a part of memory. Memory mapping improves efficiency of code and provides faster file access.

mmap2 is used to map the calling process in the virtual memory at an address specified by addr.The address specified by addr must be a multiple of the page size. In the above case, addr is set to NULL because kernel chooses mapping address if the addr is set to NULL. Therefore, setting addr to NULL, eventually increases the portability of the program.

The second argument of mmap2() specifies the no of bytes to be initialised while mapping. These bytes are available at an offset represented by the last argument pgoffset. pgoffset specifies the offset in terms of 4096-byte units.The pgoffset is refered by the argument fd which is a file descriptor.

The third argument prot specifies the protection of the mapping. Here, PROT_READ | PROT_WRITE - means the pages can be read or write. The other possible values are PROT EXEC AND PROT NONE.PROT EXEC means pages may be executed while PROT NONE means pages may not be accessed.

The fourth argument of mmap2 is flags. This argument determines whether the updates to the mapping are to be shared with other process which are mapping the same file. In the above example, bitwise ORing of two flags is done.
MAP_PRIVATE means that updates are kept private and they won't be visible to others.
MAP_ANONYMOUS means that area of process's virtual memory is not backed up by file. As no backup file is needed, there is no need for file descriptor and page offset argument. Therefore, these arguments are ignored when flag is set to MAP_ANONYMOUS. Generally, for this flag fd is set to -1 and it's contents initialised to 0.

Note that similar to the fds, even memory mapping is preserved across fork().

mmap2() returns pointer to the mapped area on success (This can be seen in the above example). On failure, -1 is returned and errno is appropriately set.

access()

access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)

 int access(const char *pathname, int mode);

access() performs various access check(s) as specified by mode on the file specified by *pathname for the calling process. Various modes are -

F_OK - tests whether the file exists
R_OK - tests whether the file grants read permission
W_OK - tests whether the file grants write permission
X_OK - tests whether the file grants execute permission

The tests depend on permissions of directories occuring in the path to file, as given in pathname. It returns 0 on success. On failure it sets errno appropriately and returns -1.
Here the program tries to access two files -
1 /etc/ld.so.nohwcap
2 /etc/ld.so.preload

and in both cases the return value is -1 and error related to existence of such files. This is because -

the programs ld.so and ld-linux.so* find and load the shared libraries needed by a program, prepare the program to run, and then run it.And this is to be explicitly specified via the -static option to ld during compilation. For our simple program of "hello", we dont need any of such libraries. So did not use -static option. So the return value was -1 as we did not specify any such -static option.

fstat64()

fstat64(3, {st_mode=S_IFREG|0644, st_size=87978, ...}) =0

    int fstat64(int fd, struct stat64 *buf);

fstat() is used to get the file status provided by the file descriptor fd(first argument). The value 3 in the first argument is the file descriptor returned after opening the file /etc/ld.so.cache. It is a variant of stat() which is also used to get the status of a file. In stat instead of the descriptor, we provide the file path. To execute this calls, no permissions on the file istelf are required. However, in case of stat(), where we specify the path, search permissions on the directories specified in path are required. These calls return information about the file in the second argument buf, which is a a pointer to the structure stat64. Some of the information fields in the structure stat of fstat() include -

1.ID of device which cotaining the file (st_dev)
2.Protection mode (st_mode)
3.No of sectors(512B) allocated to the file
4.time of last access
5.time of last modification, etc

In addition to a normal fstat, fstat64's structure stat64 has some additional information fields which include -

1.time of file creation(birth)
2.int32_t st_lspare (Reserved: Not used)
3.int64_t st_qspare[2] (Reserved: Not used)

The return value on success is 0. Otherwise, -1 is returned and the errno is set appropriately.

close()

close(3) = 0

   int close(int fd);

close() call closes the file descriptor so that it no longer points to the file and that descriptor can be used again. Resources associated with the open file description are freed, if fd was the only descriptor pointing to the file. Here, above call closes the descriptor fd = 3, which was returned by open()

mprotect()

mprotect(0xb772e000, 8192, PROT_READ) = 0

    int mprotect(const void *addr, size_t len, int prot);

mprotect() is used to change protection of the calling process's memory mapping to that specified by prot. The first argument gives the address starting from which the access protection is to be changed. It should be a multiple of the page size. The second argument is the address range given in bytes. The third argument specifies the new protection of the mapping. PROT_READ changes the protection from read and write (as specified in mmap2() for address 0xb772e000) to only read and returns a 0 on success. The other values for prot are:
PROT_NONE : No access allowed
PROT_WRITE : Write access allowed
PROT_EXEC : Memory execution allowed
SIGSEGV signal is generated if an attempt is made to access memory violating the protection norms. The function returns 0 on success and -1 on error.

munmap()

munmap(0xb7734000, 87978) = 0

    int munmap(void *addr, size_t length);

munmap() is a system call which unmaps pages of memory from a process's address space. Memory mapping is done by the #mmap2() system call. On the other hand, munmap() deletes the mappings of all pages starting with the address specified in the first argument (addr) and of size in bytes equal to that specified in the second argument (length). The address must be aligned to a page boundary. Further references to these deleted pages generates the SIGSEGV signal. Memory mapped pages are deleted automatically when a process is terminated. However, closing file descriptor does not remove the mapping.The function returns 0 on success and -1 on failure and sets appropriate errno.

write()

write(1, "Hello world!\n", 13Hello world!) = 13;

    ssize_t write(int fd, const void *buf, size_t count);

write() is a system call used to write a file pointed by the file descriptor(fd). The write() function writes count(third argument) bytes of buffer(the second argument) to the file pointed by fd.
fd = 0, 1, 2 by default point to standard input, standard output and standard error respectively. fd = 1, is the descriptor pointing to the standard output(monitor). This is the system call which is actually printing "hello world" on the screen.

On success, write() returns the no of bytes written. While on failure, -1 is returned and errno is appropriately set.

exit_group()

exit_group(13);

   void exit_group(int status);

exit_group() is similar to exit() however, unlike exit(), this function not only terminates the calling thread but, all the threads in the calling process. It does not return anything.

pipe()

pipe(fd1);

     int pipe(int pipefd[2]);

A pipe allows two processes to communicate . Mainly used for interprocess communication e.g. Producer Consumer Problem. It creates two file descriptors pointing to the end of file for reading and writing purpose. On success, returns 0 else 1 on error. Data which is written is buffered until it is read from the pipe (read end). Two types of pipes are used in both Unix and Windows system- (1) Ordinary pipe, (2) Named pipes.

(1) Ordinary pipes- allow two processes to communicate in standard producer-consumer fashion. This allows only one way communication. In Unix system these pipes are constructed using the function "pipe(int fd[])".

(2) Named pipes- allow simple communication between a pair of processes. This allows bi-directional communication. Named pipes can be created using the mkfifo(const char *, mode_t) system call. See mkfifo(3).

clone()

     int clone (int (*fn) (void *), void *child_stack, int flags, void *arg);

Clone() is a system call which creates new process as just fork() system call does. It creates a child process which can share its execution code with parent. This system call is helpful in multi threading. Generally this system call is not called directly, it can be called through pthread library. As shown in above syntax a new thread starts with the function which is pointed by fn argument.

chdir()

Change working directory

    int chdir(const char *path);

chdir() system call is used to change current working directory. This can be done through both functions shown above. chdir function changes our current directory to that which is specified by path. And fchdir uses a directory referenced by fd. On success it returns 0 and on error -1 is returned.

fchdir

    int fchdir(int fd);

fchdir() is identical to chdir(); the only difference is that the directory is given as an open file descriptor.

vfork()

    pid_t vfork(void);

vfork() is almost similar to fork(), it creates duplicate process as fork(). But the only difference is when a duplicate process id created using vfork() parent process temporarily suspends. In this case child process might borrow parent's memory space. But it should take care that parent's data should not be modified unexpectedly. On success it returns 0 to child and process ID to parent process or -1 for error.

Synchronization

Synchronization means co-ordination of multiple simultaneous processes or threads in order to get correct runtime order and avoid race conditions. Race condition occurs when two or more processes access and manipulate data concurrently. In case of two processes each process has section of code in which the process may be changing common variables, updating the table. This section of code is known as critical section. The critical section problem is used to devise a protocol to help two or more processes co-operate. The solution to critical section must solve the three requirements viz Mutual exclusion, Progress, Bounded waiting. There are various solutions to the critical section problem one them is Peterson's solution. The Peterson's solution is valid in case of two processes. The main idea behind the Peterson's solution is that the processes are alternately present in the critical section. Another solution to critical section problem the locking. So the process which currently has the lock can only enter the critical section other process has to wait until the lock is released. Solutions to synchronization problem at hardware level have been devised like the atomic test and set and atomic swap.

References

Linux manual pages - strace(), brk(2), execve(2), mmap(2), access(), fstat(),close(), mprotect(), munmap(), write(2), exit_group(2)
http://en.wikipedia.org/wiki/Mmap
http://en.wikipedia.org/wiki/Memory-mapped_file
http://manpages.ubuntu.com/manpages/hardy/man8/ld.so.8.html
http://unixhelp.ed.ac.uk/CGI/man-cgi?ld.so
http://developer.apple.com/library/ios/#documentation/System/Conceptual/ManPages_iPhoneOS/man2/fstat64.2.html
http://www.linuxplanet.com/linuxplanet/tutorials/7229/1

Operating Systems

Contents

Instructor

Syllabus

Scheme of Evaluation

Compiling Linux Kernel

Debugfs

Virtual Machine Setup

strace on ./helloworld"

Output of $ strace ./hello

strace

Generate Statistics Report of System Calls Using Option -c

Sequencing and meaning of signals in strace

System calls

execve()

brk()

mmap2()

access()

fstat64()

close()

mprotect()

munmap()

write()

exit_group()

pipe()

clone()

chdir()

fchdir

vfork()

Synchronization

References

Navigation menu

Operating Systems

Instructor

Syllabus

Scheme of Evaluation

Compiling Linux Kernel

Debugfs

Virtual Machine Setup

strace on ./helloworld"

Output of $ strace ./hello

strace

Generate Statistics Report of System Calls Using Option -c

Sequencing and meaning of signals in strace

System calls

execve()

brk()

mmap2()

access()

fstat64()

close()

mprotect()

munmap()

write()

exit_group()

pipe()

clone()

chdir()

fchdir

vfork()

Synchronization

References

Navigation menu

Search