Library (computer science)

(Redirected from Shared library)

Image:Libs dia.png
Illustration of an application which may use libvorbisfile.so to play an Ogg Vorbis file.

In computer science, a library is a collection of subprograms used to develop software. Libraries are distinguished from executables in that they are not independent programs; rather, they are "helper" code that provides services to some other independent program. Today the vast majority of the code that executes in a typical application is located in the libraries it uses.

Contents

Dynamic linking

Dynamic linking means that the data in a library is not copied into a new executable or library at compile time, but remains in a separate file on disk. Only a minimal amount of work is done at compile time by the linker-- it only records what libraries the executable needs and the index names or numbers. The majority of the work of linking is done at the time the application is loaded (loadtime) or during the execution of the process (runtime). The necessary linking code, called a loader, is actually part of the underlying operating system. At the appropriate time the loader finds the relevant libraries on disk and adds the relevant data from the libraries to the process's memory space. Some operating systems can only link in a library at loadtime, before the process starts executing; others may be able to wait until after the process has started to execute and link in the library just when it is actually referenced (i.e. during runtime). The latter is often called "delay loading". In either case, the library is called a dynamically linked library. This term is sometimes shortened to "dynamic link library" or DLL, but this last initialism is most common in Microsoft Windows environments where dynamic libraries use the filename extension .dll.

One wrinkle that the loader must handle is that the location in memory of the actual library data is not knowable until after the executable and all dynamically linked libraries have been loaded into memory, since the memory locations used depend on which specific DLLs have been loaded. It is not possible to store the absolute location of the data in the executable, nor even in the DLL. It would theoretically be possible to examine the program at load time and replace all references to data in the libraries with pointers to the appropriate memory locations once all DLLs have been loaded, but this method would consume unacceptable amounts of either time or memory. Instead, most dynamic library systems link a symbol table with blank addresses into the program at compile time. All references to code or data in the library pass through this table, the import directory. At load time the table is modified with the location of the library code/data by the loader/linker.

The library itself contains a table of all the methods within it, known as entry points. Calls into the library "jump through" this table, looking up the location of the code in memory, then calling it. This introduces overhead in calling into the library, but the delay is so small as to usually be ignorable.

Dynamic linkers/loaders vary widely in functionality. Some depend on explicit paths to the libraries being stored in the executable. Any change to the library naming or layout of the filesystem will cause these systems to fail. More commonly, only the name of the library (and not the path) is stored in the executable, with the operating system supplying a system to find the library on-disk based on some algorithm. Unix-based systems have a list of "places to look" in a configuration file, and developers of libraries are encouraged to place their dynamic libraries in these places. On the downside this can make installation of new libraries problematic, and these "known" locations quickly become home to an increasing number of library files, making management more complex. Microsoft Windows will check the Registry to determine the proper place to find an ActiveX DLL, but for standard DLLs it will check the current working directory; the directory set by SetDllDirectory(); the System32, System, and Windows directories; and finally the PATH environment variable. OpenStep used a more flexible system, collecting up a list of libraries from a number of known locations (similar to the PATH concept) when the system first starts. Moving libraries around causes no problems at all, although there is a time cost when first starting the system.

One of the largest disadvantages of dynamic linking is that the executables depend on the separately stored libraries in order to function properly. If the library is deleted, moved, or renamed, or if an incompatible version of the DLL is copied to a place that is earlier in the search, the executable could malfunction or even fail to load. On Windows this is commonly known as DLL hell.

Dynamic linking libraries date back to at least MTS (the Michigan Terminal System), built in the late 1960s. ("A History of MTS", Information Technology Digest, Vol. 5, No. 5)

Dynamic loading

This is a subcategory of dynamic linking that refers to a dynamically linked library being loaded at runtime, during the execution of a program, due to code that the program executes at runtime to specifically load a DLL, and not due to actions taken by the linker at compiletime. In this instance the library can be referred to as a dynamically loaded library (DLL). This form of library is typically used for plugin modules, such as add-ins for a spreadsheet program, and interpreters needing to load certain functionality on demand.

Most systems supporting dynamic libraries also support dynamic loading via a dynamic loading API in the underlying operating system. For instance, Windows uses the functions LoadLibrary() and GetProcAddress(); UNIX-type systems use dlopen() and dlsym(). Some development systems automate this process.

Remote libraries

Another solution to the library issue is to use completely separate executables (often in some lightweight form) and call them using a remote procedure call (RPC). This approach maximized operating system re-use: the code needed to support the library is the same code being used to provide application support and security for every other program. Additionally, such systems do not require the library to exist on the same machine, but can forward the requests over the network.

The downside to such an approach is that every library call requires a considerable amount of overhead. RPC calls are generally very expensive, and often avoided where possible. Nevertheless this approach has become popular in a number of domain-specific areas, notable client-server systems and application servers such as Enterprise JavaBeans.

Shared library

In addition to being loaded statically or dynamically, libraries are also often classified according to how they are shared among programs. Dynamic libraries almost always offer some form of sharing, allowing the same library to be used by multiple programs at the same time. Static libraries, by definition, cannot be shared; they are linked into each program.

The shared library term is slightly ambiguous, because it covers at least two different concepts. First, it is the sharing of code located on disk by unrelated programs. The second concept is the sharing of code in memory, when programs execute the same physical page of RAM, mapped into different address spaces. It would seem that the latter would be preferable, and indeed it has a number of advantages. For instance on the OpenStep system, applications were often only a few hundred kilobytes in size and loaded almost instantly; the vast majority of their code was located in libraries that had already been loaded for other purposes by the operating system. There is a cost, however; shared code must be specifically written to run in a multitasking environment, and this has effects on performance.

RAM sharing can be accomplished by using position independent code as in Unix, which leads to a complex but flexible architecture, or by using normal, ie. not position independent code as in Windows and OS/2. These systems make sure, by various tricks like pre-mapping the address space and reserving slots for each DLL, that code has a great probability of being shared. Windows DLLs are not shared libraries in the Unix sense. The rest of this article concentrates on aspects common to both variants.

In most modern operating systems, shared libraries can be of the same format as the "regular" executables. This allows two main advantages: first, it requires making only one loader for both of them, rather than two. The added complexity of the one loader is considered well worth the cost. Secondly, it allows the executables also to be used as DLLs, if they have a symbol table. Typical executable/DLL formats are ELF (Unix) and PE (Windows). In Windows, the concept was taken one step further, with even system resources such as fonts being bundled in the DLL file format. The same is true under OpenStep, where the universal "bundle" format is used for almost all system resources.

The term DLL is mostly used on Windows and OS/2 products. On Unix platforms, the term shared library is more commonly used. This is technically justified in view of the different semantics. More explanations are available in the position independent code article.

In some cases, an operating system can become overloaded with different versions of DLLs, which impedes its performance and stability. Such a scenario is known as DLL hell.

Object Libraries

Dynamic linking developed during the late 1980s and was generally available in some form in most operating systems by the early 1990s. It was during the same period that object-oriented programming (OOP) was first making its way into the programming market. OOP requires additional information that traditional libraries don't supply; in addition to the names and entry points of the code located within, they also require a list of the objects they depend on. This is a side-effect of one of OOP's main advantages, inheritance, which means that the complete definition of any method may be defined in a number of places. This is more than simply listing that one library requires the services of another, in a true OOP system, the libraries themselves may not be known at compile time, and vary from system to system.

At the same time another common area for development was the idea of multi-tier programs, in which a "display" running on a desktop computer would use the services of a mainframe or minicomputer for data storage or processing. For instance, a program on a GUI-based computer would send messages to a minicomputer to return small samples of a huge dataset for display. Remote procedure calls already handled these tasks, but there was no standard RPC system.

It was not long before the majority of the mini/mainframe vendors were working on projects to combine the two, producing an OOP library format that could be used anywhere. Such systems were known as object libraries, or distributed objects if they supported remote access (not all did). Microsoft's COM is an example of such a system for local use, DCOM a modified version that support remote access.

For some time object libraries were the "next big thing" in the programming world. There were a number of efforts to create systems that would run across platforms, and companies competed to try to get developers locked into their own system. Examples include IBM's System Object Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable Distributed Objects (PDO), Digital's ObjectBroker, Microsoft's Component Object Model (COM/DCOM), and any number of CORBA-based systems.

In the end, it turned out that OOP libraries were not the next big thing. With the exception of Microsoft's COM and NeXT's (now Apple Computer) PDO, all of these efforts have since ended.

The jar (file format) is mainly used for object libraries in the Java programming language. It consists of (compressed) classes in bytecode format and is loaded by a java virtual machine or special class loaders.

Naming

  • GNU/Linux, Solaris and BSD variants: libfoo.a and libfoo.so files are placed in folders like /lib, /usr/lib or /usr/local/lib are dynamically linked libraries. The filenames always start with lib, and end with .a (archive, static library) or .so (shared object, dynamically linked library), with an optional interface number. For example libfoo.so.2 is the second major interface revision of the dynamically linked library libfoo. Old Unix versions would use major and minor library revision numbers (libfoo.so.1.2) while contemporary Unixes will only use major revision numbers (libfoo.so.1). Dynamically loaded libraries are placed in /usr/libexec and similar directories. The .la files sometimes found in the library directories are libtool archives, not useable by the system as such.
  • Mac OS X and upwards: The system inherits static library conventions from BSD, and can use .so-style libraries (with the .dylib suffix instead). Most libraries, however, are dynamic, and placed inside of special directories called "bundles," which wrap the library's required files and metadata. For example a library called "My Neat Library" would be implemented in a bundle called "My Neat Library.framework".
  • Microsoft Windows: *.LIB files are statically linkable libraries and *.DLL files are dynamically linkable libraries. The interface revisions are encoded in the files, or abstracted away using COM-object interfaces.

See also

External links

de:Programmbibliothek et:Teek es:Biblioteca (programación) fr:Bibliothèque logicielle hu:Dinamikus csatolású könyvtár nl:Bibliotheek (Informatica) ja:ライブラリ pl:Biblioteka (informatyka) pt:DLL ru:Библиотека (программирование) zh:库