Deployment Guide
Red Hat Directory Server                                                            

Previous
Contents
Index
Next

Chapter 2

How to Plan Your Directory Data


The data stored in your directory may include user names, email addresses, telephone numbers, and information about groups users are in, or it may contain other types of information. The type of data in your directory determines how you structure the directory, to whom you allow access to the data, and how this access is requested and granted.

This chapter describes the issues and strategies behind planning your directory's data. It includes the following sections:

Introduction to Directory Data

Some types of data are better suited to your directory than others. Ideal data for a directory has some of the following characteristics:

For example, an employee's name or the physical location of a printer can be of interest to many people and applications.
For example, an employee's preference settings for a software application may not seem to be appropriate for the directory because only a single instance of the application needs access to the information. However, if the application is capable of reading preferences from the directory and users might want to interact with the application according to their preferences from different sites, then it is very useful to include the preference information in the directory.

What Your Directory Might Include

Examples of data you can put in your directory are:

If you are going to use Directory Server for more than just server administration, then you have to decide what other types of information you want to store in your directory. For example, you might include some of the following types of information:

What Your Directory Should Not Include

Directory Server is excellent for managing large quantities of data that client applications read and write, but it is not designed to handle large, unstructured objects, such as images or other media. These objects should be maintained in a filesystem. However, your directory can store pointers to these kinds of applications through the use of FTP, HTTP, or other types of URL.

Defining Your Directory Needs

When you design your directory data, think not only of the data you currently require but also what you may include in your directory in the future. Considering the future needs of your directory during the design process influences how you you structure and distribute the data in your directory.

As you plan, consider these points:

Performing a Site Survey

A site survey is a formal method for discovering and characterizing the contents of your directory. Budget plenty of time for performing a site survey, as data is the key to your directory architecture.The site survey consists of the following tasks, which are described briefly here and in more detail next:

Determine the directory-enabled applications you deploy and their data needs.
Survey your enterprise and identify sources of data (such as Windows NT or Active Directory, PBX systems, human resources databases, email systems, and so forth).
Determine what objects should be present in your directory (for example, people or groups) and what attributes of these objects you need to maintain in your directory (such as user name and passwords).
Decide how available your directory data needs to be to client applications, and design your architecture accordingly. How available your directory needs to be affects how you replicate data and configure chaining policies to connect data stored on remote servers.
For more information about replication, refer to chapter 6, "Designing the Replication Process. For more information on chaining, refer to "Topology Overview," on page 89.
A data master contains the primary source for directory data. This data might be mirrored to other servers for load balancing and recovery purposes. For each piece of data, determine its data master.
For each piece of data, determine the person responsible for ensuring that the data is up-to-date.
If you import data from other sources, develop a strategy for both bulk imports and incremental updates. As a part of this strategy, try to master data in a single place, and limit the number of applications that can change the data. Also, limit the number of people who write to any given piece of data. A smaller group ensures data integrity while reducing your administrative overhead.
Because of the number of organizations that can be affected by the directory, it may be helpful to create a directory deployment team that includes representatives from each affected organization. This team performs the site survey.
Corporations generally have a human resources department, an accounting and/or accounts receivable department, one or more manufacturing organizations, one or more sales organizations, and one or more development organizations. Including representatives from each of these organizations can help you perform the survey. Furthermore, directly involving all the affected organizations can help build acceptance for the migration from local data stores to a centralized directory.

Identifying the Applications That Use Your Directory

Generally, the applications that access your directory and the data needs of these applications drive the planning of your directory contents. Some of the common applications that use your directory include:

When you examine the applications that will use your directory, look at the types of information each application uses. The following table gives an example of applications and the information used by each:

Table 2-1 Application Data Needs  
Application
Class of Data
Data
Phonebook
People
Name, email address, phone number, user ID, password, department number, manager, mail stop.
Web server
People, groups
User ID, password, group name, groups members, group owner.
Calendar server
People, meeting rooms
Name, user ID, cube number, conference room name.

Once you identify the applications and information used by each application, you can see that some types of data are used by more than one application. Doing this kind of exercise during the data planning stage can help you avoid data redundancy problems in your directory and see more clearly what data your directory-dependent applications require.

The final decision you make about the types of data you maintain in your directory and when you start maintaining it is affected by these factors:

Identifying Data Sources

To identify all of the data that you want to include in your directory, you should perform a survey of your existing data stores. Your survey should include the following:

Locate all the organizations that manage information essential to your enterprise. Typically, this includes your information services, human resources, payroll, and accounting departments.
Some common sources for information are networking operating systems (Windows, Novell Netware, UNIX NIS), email systems, security systems, PBX (telephone switching) systems, and human resources applications.
You may find that centralized data management requires new tools and new processes. Sometimes centralization requires increasing staff in some organizations while decreasing staff in others.

During your survey, you may come up with a matrix that resembles the following table, identifying all of the information sources in your enterprise:

Table 2-2 Information Sources  
Data Source
Class of Data
Data
Human resources database
People
Name, address, phone number, department number, manager.
Email system
People, Groups
Name, email address, user ID, password, email preferences.
Facilities system
Facilities
Building names, floor names, cube numbers, access codes.

Characterizing Your Directory Data

All of the data you identify for inclusion in your directory can be characterized according to the following general points:

You should study each piece of data you plan to include in your directory to determine what characteristics it shares with the other pieces of data. This helps save time during the schema design stage, described in more detail in chapter 3, "How to Design the Schema.

For example, you can create a table that characterizes your directory data as follows:

Table 2-3 Directory Data Characteristics  
Data
Format
Size
Owner
Related to
Employee Name
Text string
128 characters
Human resources
User's entry
Fax number
Phone number
14 digits
Facilities
User's entry
Email address
Text
Many character
IS department
User's entry

Determining Level of Service

The level of service you provide depends upon the expectations of the people who rely on directory-enabled applications. To determine the level of service each application expects, first determine how and when the application is used.

As your directory evolves, it may need to support a wide variety of service levels, from production to mission critical. It can be difficult raising the level of service after your directory is deployed, so make sure your initial design can meet your future needs.

For example, if you determine that you need to eliminate the risk of total failure, you might consider using a multi-master configuration, in which several suppliers exist for the same data. The next section discusses determining data masters in more detail.

Considering a Data Master

The data master is the server that is the master source of data. Consider which server will be the data master when your data resides in more than one physical site. For example, when you use replication or use applications that cannot communicate over LDAP, data may be spread over more than one site. If a piece of data is present in more than one location, you need to decide which server has the master copy and which server receives updates from this master copy.

Data Mastering for Replication

Directory Server allows you to contain master sources of information on more than one server. If you use replication, decide which server is the master source of a piece of data. Directory Server supports multi-master configurations, in which more than one server is the master source for the same piece of data. For more information about replication and multi-master replication, see chapter 6, "Designing the Replication Process.

In the simplest case, put a master source of all of your data on two Directory Servers, and then replicate that data to one or more consumer servers. Having two supplier servers provides safe failover in the event that a server goes off-line. In more complex cases, you may want to store the data in multiple databases, so that the entries are mastered by a server close to the applications which will update or search that data.

Data Mastering for Synchronization

You can synchronize your Directory Server users, groups, attributes, and passwords with Microsoft Active Directory or Windows NT4 Server users, groups, attributes, and passwords. If you have two directory services, you must decide if you want these to be synchronized (if they will handle the same information), what amount of that information will be shared, and which service will be the data master for that information. The best course is to choose a single application to master the data and allow the synchronization process to add, update, or delete the entries on the other service.

Data Mastering Across Multiple Applications

You also need to consider the master source of your data if you have applications that communicate indirectly with the directory. Keep the processes for changing data, and the places from which you can change data, as simple as possible. Once you decide on a single site to master a piece of data, use the same site to master all of the other data contained there. A single site simplifies troubleshooting if your databases get out of sync across your enterprise.

Here are some ways you can implement data mastering:

Maintaining multiple data masters does not require custom scripts for moving data in and out of the directory and the other applications. However, if data changes in one place, someone has to change it on all the other sites. Maintaining master data in the directory and all applications not using the directory can result in data being unsynchronized across your enterprise (which is what your directory is supposed to prevent).
Mastering data in non-directory applications makes the most sense if you can identify one or two applications that you already use to master your data, and you want to use your directory only for lookups (for example, for online corporate telephone books).

How you maintain master copies of your data depends on your specific needs. However, regardless of how you maintain data masters, keep it simple and consistent. For example, you should not attempt to master data in multiple sites, then automatically exchange data between competing applications. Doing so leads to a "last change wins" scenario and increases your administrative overhead.

For example, suppose you want to manage an employee's home telephone number. Both the LDAP directory and a human resources database store this information.The human resources application is LDAP enabled, so you can write an automatic application that transfers data from the LDAP directory to the human resources database, and vice versa. However, if you attempt to master changes to that employee's telephone number in both the LDAP directory and the human resources data, then the last place where the telephone number was changed overwrites the information in the other database. This is acceptable as long as the last application to write the data had the correct information. But if that information was old or out of date (perhaps because, for example, the human resources data was reloaded from a backup), then the correct telephone number in the LDAP directory will be deleted.

Determining Data Ownership

Data ownership refers to the person or organization responsible for making sure the data is up-to-date. During the data design, decide who can write data to the directory. Some common strategies for deciding data ownership follow:

This subset of information might include their passwords, descriptive information about themselves and their role within the organization, their automobile license plate number, and contact information such as telephone numbers or office numbers.
This approach makes your organization's administrators your directory content managers.
For example, you might create roles for human resources, finance, or accounting. Allow each of these roles to have read access, write access, or both to the data needed by the group, such as salary information, government identification number (in the US, Social Security Number), and home phone numbers and address.
For more information about roles and grouping entries, refer to "Grouping Directory Entries," on page 73.

As you determine who can write to the data, you may find that multiple individuals need to have write access to the same information. For example, you will want an information systems (IS) or directory management group to have write access to employee passwords. You may also want the employees themselves to have write access to their own passwords. While you generally must give multiple people write access to the same information, try to keep this group small and easy to identify. Keeping the group small helps ensure your data's integrity.

For information on setting access control for your directory, see chapter 7, "Designing a Secure Directory.

Determining Data Access

After determining data ownership, decide who can read each piece of data. For example, you may decide to store an employee's home phone number in your directory. This data may be useful for a number of organizations, including the employee's manager and human resources. You may want the employee to be able to read this information for verification purposes. However, home contact information can be considered sensitive. Therefore, you must determine if you want this kind of data to be widely available across your enterprise.

For each piece of information that you store in your directory, you must decide the following:

The LDAP protocol supports anonymous access and allows easy lookups for common information such as office sites, email addresses, and business telephone numbers. However, anonymous access gives anyone with access to the directory access to the common information. Consequently, you should use anonymous access sparingly.
You can set up access control so that the client must log in to (or bind to) the directory to read specific information. Unlike anonymous access, this form of access control ensures that only members of your organization can view directory information. It also allows you to capture login information in the directory's access log so you have a record of who accessed the information.
For more information about access controls, refer to "Designing Access Control," on page 158.
Anyone who has write privileges to the data generally also needs read access (with the exception of write access to passwords). You may also have data specific to a particular organization or project group. Identifying these access needs helps you determine what groups, roles, and access controls your directory needs.
For information about groups and roles, see chapter 4, "Designing the Directory Tree. For information about access controls, see chapter 7, "Designing a Secure Directory.

As you make these decisions for each piece of directory data, you define a security policy for your directory. Your decisions depend upon the nature of your site and the kinds of security already available at your site. For example, if your site has a firewall or no direct access to the Internet, you may feel freer to support anonymous access than if you are placing your directory directly on the Internet. Additionally, some information may only need access controls and authentication measures to restrict access adequately; other sensitive information may need to be encrypted within the database as it is stored.

In many countries, data protection laws govern how enterprises must maintain personal information and restrict who has access to the personal information. For example, the laws may prohibit anonymous access to addresses and phone numbers or may require that users have the ability to view and correct information in entries which represent them. Be sure to check with your organization's legal department to ensure that your directory deployment follows all necessary laws for the countries in which your enterprise operates.

The creation of a security policy and the way you implement it is described in detail in chapter 7, "Designing a Secure Directory.

Documenting Your Site Survey

Because of the complexity of data design, document the results of your site surveys. During each step of the site survey, we have suggested simple tables for keeping track of your data. Consider building a master table that outlines your decisions and outstanding concerns. You can build this table with the word-processing package of your choice or use a spreadsheet so that the table's contents can easily be sorted and searched.

A simple example of a table follows. The table identifies data ownership and data access for each piece of data identified by the site survey.

Data Name
Owner
Supplier Server/Application
Self Read/Write
Global Read
HR Writable
IS Writable
Employee name  
HR  
PeopleSoft  
Read-only  
Yes (anonymous)  
Yes  
Yes
User password  
IS  
Directory US-1  
Read/Write  
No  
No  
Yes
Home phone number  
HR  
PeopleSoft  
Read/Write  
No  
Yes  
No
Employee location  
IS  
Directory US-1  
Read-only  
Yes (must log in)  
No  
Yes
Office phone number  
Facilities  
Phone switch  
Read-only  
Yes (anonymous)  
No  
No

Looking at the row representing the employee name data, we see the following:

Human Resources owns this information and therefore is responsible for updating and changing it.
The PeopleSoft application manages employee name information.
A person can read his own name but not write (or change) it.
Employee names can be read anonymously by everyone with access to the directory.
Members of the human resources group can change, add, and delete employee names in the directory.
Members of the information services group can change, add, and delete employee names in the directory.

Repeating the Site Survey

Finally, you may need to run more than one site survey, particularly if your enterprise has offices in multiple cities or countries. You may find your informational needs to be so complex that you have to allow several different organizations to keep information at their local offices rather than at a single, centralized site. In this case, each office that keeps a master copy of information should run its own site survey. After the site survey process has been completed, the results of each survey should be returned to a central team (probably consisting of representatives from each office) for use in the design of the enterprise-wide data schema model and directory tree.




Previous
Contents
Index
Next

© 2001 Sun Microsystems, Inc. Used by permission. © 2005 Red Hat, Inc. All rights reserved.
Read the Full Copyright and Third-Party Acknowledgments.

last updated May 20, 2005